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ABSTRACT 

This report, profusely illustrated with color 
photographs and other graphics, elaborates on the Department of 
Energy (DOE) research program in High Performance Computing and 
Communications (HPCC) . The DOE is one of seven agency programs within 
the Federal Research and Development Program working on HPCC. The DOE 
HPCC program emphasizes research in four areas: (1) HPCC 
systems— evaluate advanced architectures for large-scale scientific 
and engineering applications; (2) Advanced Software Technology and 
Algorithms — research computational methods, algorithms, and tools; 
develop large-scale data handling techniques; establish HPCC research 
centers; and exploit extensive teaming among scientists and 
engineers, applied mathematicians, and computer scientists; (3) 
National Research and Education Network — research and develop 
high-speed computer networking technologies through a multiagency 
cooperative effort; and (4) Basic Research and Human 
Resources — encourage research partnerships between national 
laboratories, industry, and universities; support computational 
mathematics and computer science research; establish research and 
educational programs in computational science; and enhance K through 
12 education- The stated goals of the DOE HPCC program are to support 
the economic competitiveness and productivity of the United States, 
accelerate the application of HPCC technology to the solution of 
scientific and engineering problems, and enhance U.S leadership in 
research.- development, and deployment of HPCC technologies, A number 
of examples of DOE HPCC operations are provided, including the Matvu 
matrix visualization computer software package; Global Ocean Model; 
Molecular Dynamics Simulations; Pion Propagator; Three-Dimensional 
Tokamak Modeling; MediaView, a system for multimedia communication; 
and the CASA testbed, a wide-area, very high speed communication 
network that enables a number of collaborating agencies to work with 
geographically dispersed supercomputing resources. (DB) 
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EXECUTIVE SUMMARY 



The Department of Energy (DOE) High 
Performance Computing and Commu- 
nications (HFCC) Program is one of 
seven agency programs within the 
Federal Research and Development 
Program an HPCC that was proposed with the 
President s Fiscal Year (FY) 1992 Budget and 
described in the supplemental report to the 
budget entitled Grand Challenges: High 
Performance Computing and Communications? 

The overa 1 Federal HPCC Program is 
coordinated by the Federal Coordinating 
Council or* Science, Engineering, and Technol- 
ogy (FCCSET) Committee on Physical, 
Mathematical, and Engineering Sciences 
(PMES) through its subcommittee on High 
Performance Computing, Communications, and 
Information Technologies (HPCOT). The 
Grand Challenges report 1 describes the pro- 
grams and interrelationships of the Federal 
agencies that are participating in the HPCC 
Program. 

The purpose of this document is to elaborate 
on the DOE research program in HPCC. 

DOE HPCC GOALS 



To support United States economic 
competitiveness and productivity through 
interdisciplinary research and human resource 
development 

To accelerate the application of HPCC 
technolrcy to the solution of scientific and 
enginceung problems of significant 
departmental interest. 

To enhance United States leadership in 
research, development, and deployment of 
HPCC technologies. 

STRATEGY 



Support the underlying research, network, 
ami computational infrastructures on which 
HPCC technology is based. 

Develop the human resource base to meet 
the growing needs of industry, academia, and 
government in the area of HPCC. 

PROGRAM DESCRIPTION 



The DOE HPCC program will emphasize 
research in each of the following four Key 
areas. Selected computational challenges, 
which have a significant effect on national 
leadership in science and technology, will be 
used as focal points for these efforts. 

HPCC Systems — evaluate advanced archi- 
tectures for large-scale scientific and engineer- 
ing applications. 

Advanced Software Technology and Algo- 
rithms — research computational methods, algo- 
rithms, and tools; develop large-scale data-han- 
dling techniques; establish HPCC research cen- 
ters; exploit extensive teaming among scientists 
and engineers, applied mathematicians, and 
computer scientists. 

National Research and Education Net- 
work — research and develop high-speed net- 
working technologies through a multiagency 
effort. 

Basic Research and Human Resources — en- 
courage research partnerships between national 
laboratories, industry, and universities; support 
compu Atonal mathematics and computer sci- 
ence research: establish research and educa- 
tional programs in computational science; en- 
hance K through 12 education. 



Support computational advances through in- 
creased research and development efforts in ar- 
eas of traditional strength in the department. 

Promote the use of department and depart- 
ment-supported facilities as a market for HPCC 
prototypes and commercial product*. 



fun? 



n February 5. 199!, Dr. Allan Brom- 
ley announced ihe FY 1992 U,S. Re- 
search and Development Program for 
high performance computing and 
communications to Congress, This 
HPCC Program, described in the supplement' 
in the President s FY 1992 budget, is the 
culmination of several years ol effon on the 
pan of senior scientists and managers in gov- 
ernment, academia. and industry examining 
ihe state of U.S. high performance computer 
and network technology. The program recom- 
mends increased federal spending by the De- 
partment of Commerce, the Defense Advanced 
Research Projects Agency iDAKPA). ihe 
DOE. the Environmental Protection Agency 
iliPAK the Department ol Health and Human 
Resources, the National Aeronautics and 
Space Administration iNASAK and Ihe Na- 
itonaJ Science Foundation tNSFi for research 
in advanced computer technologies to develop 
dramatically more capable supercomputers, 
more powerful software capabilities, and high- 
speed computer networks. This DOE docu- 
ment describes and discusses only the poten- 
tial DOE im >aiives in response to the program 
plan and funding recommendations, 

DOE HPCC PROGRAM HISTORY 

Because of us mission and the computation- 
ally intensive natua* ot energy-related applica- 
tions and problems, ihe IX)F mission depends 
nn advancements in computational techniques 
and computer and networking technologies, 
As a result. DOE has a long history of 
computational research and development, with 
strong industrial and university cooperation. 
The current DOE Applied Mathematical 
Sciences Program came about as the result of a 
suggestion by John von Neumann to enhance 
understanding of the use of digital computers 
in nuclear applications. ( onsequently, DOE 
has been prominent in maintaining the U.S. 
leadership in HPCC. in encouraging — and 
even providing — innovation in HPCC tech- 
nologies, and in supporting LIS competitive- 
ness and productivity through its extensive use 
of HPCC technologies. The table (right) 
outlines a history of DOE supercomputing 
hardware involvement in which the DOE 
national laboratory system has worked with 
ILS. vendors to bring promising and innova- 
tive computing technologies to bear on 
departmental applications. 

O 
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Year / System / Site 

1945 Eniac U. of Pennsylvania, 
AEQDOEVLANL First User 

1952 Maniac AEQDOEVLANL 
First Operational Fabrication of 
von Neumann Design 

1953 IBM 701 AEQDOEVLANL, 
First easterner installation 

1956 IBM 704 AEQDOEVLANL. 
First customer installation 
1959 LARC AEQDOEVLLNL. 
Prototype system 

1961 IBM 7030 Stretch AEQDOE)/ 
LANL. Prototype; AEQDOEVLLNL. 
Serial I 

1963 CDC 3600 AEQDOEVLLNL. 
Serial 1 

1964 CDC 6600 AEQDOEVLLNL. 
Serial 1 

1969 CDC 7600 AEQDOEVLLNL. 
Serial I 

1974 CDC Star 100 

AEQDOEVLLNL. Serial I 
1976 MFENET 

ERDAfDOEVNMFECC at LLNL. 
First nationwide supercomputer 
network 
1976 CRAY 1 
ERDAfDOEVLANL. Serial I 

1981 IBM 3081 DO El S LAC, 
First customer installation 

1982 CDC Cyber 205 DOE/KAPL. 
Serial 3, first U.S. installation 

1984 CRAY 2/1 DOEINMFECC 
at LLNL. Serial J quadrant 

1985 CRAY 2/4 DOEINMFECC 
at LLNL, Serial I four processor 

1987 ETA- 1 OEM 

DOE sponsored SCRI at FSU, Serial 1 

1988 IBM 

DOEILANL first H1PP1 prototype 

1989 ETA- 10G/4 DO El FSU. 
first and only 7-ns machine 

1989 CM-2 DOEILANL, 
first 64K floating point machine 

1990 CRAY 2/8 DOEINERSC. 
first 8~proressor CRAY 2 

1990 Intel 1860/128 DOEfORNL. 
Serial I Touchstone Gamma 
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Eniac computer 



DOE laboratories are among the pioneers of 
the network-based shared-resource architecture 
of the supercomputer environment and its sup- 
porting supercomputer system software. Ex- 
amples include the first timesharing operating 
system for supercomputers, three generations of 
mass storage servers, a supercomputer Fortran 
environment including optimizing compilers, a 
portable Fortran mathematical subroutine li- 
brary, interactive debugging, vector extensions, 
and a portable operating system interface li- 
brary. ITie laboratories have also pioneered 
computational science and the supporting math- 
ematical techniques and libraries. 

The exploitation of state-of-the-art, innova- 
tive supercomputer technology has become 
more and more challenging. The complex, mas- 
sively parallel computer architectures needed to 
provide the requisite computer power to solve 
forefront energy research problems in the next 
decade will present difficult computational re- 
search problems. These will require a rethink- 
ing of disciplinary and organizational bound- 
aries and will demand interdisciplinary team- 
work on a scale that is unprecedented in recent 
memory. 



As an example, consider the computational 
challenges posed by the various energy systems 
and materials of the automobile. The DOE mis- 
sion to provide and to implement a national en- 
ergy strategy must include major efforts to re- 
duce automotive gasoline consumption. 

There are many energy-related problems — 
all of which are computational challenges — in- 
volved in developing more fuel efficient, envi- 
ronmentally sound, and safer automobiles. 
These include the modeling of the combustion 
systems, the materials and structure, the aero^ 
dynamics, the control of materials processing 
and components manufacture, and the use of al- 
ternative energy sources such as chemical bat- 
teries or solar technologies. Any one of these 
individual energy applications saturates super- 
computer systems today. In order to develop 
and computationally experiment with the com- 
plex, integrated models required to advance the 
understanding of these problems, researchers 
will need to use the tcraflops computer systems 
proposed for development in the Federal HPCC 
Program. It will also be necessary to bring to- 
gether teams of scientists and engineers, ap- 
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plied mathematicians, and computer scientists 
to develop the software technology and algo- 
rithms for these difficult problems. High Per- 
formance Computing Research Centers 
{HPCRCs) will serve as focal points for these 
teams, in addition to addressing grand chal- 
lenge applications problems, the activities of 
individual HPCRCs may include using early 
versions of advanced computers tor software 
development, providing feedback to system 
developers, and providing network access to 
the broader research community. 

The DOE has long recognized the need for 
computer networks to support remote access to 
its supercomputer systems and to support dis- 
tributed research collaborations. In addition to 
supercomputers, the DOE is responsible for 
highly sophisticated facilities such as the Su- 
perconducting Supercollider, synchrotron 
light sources, neutron sources, and electron 
microscopes to probe the atomic and molecu- 
lar structures for materials, medical, chemical, 
biological and pharmacological research. 
High-speed communications access to such fa- 
cilities is necessary to enhance the quality and 
scope of research collaborations. Over 15 
years ago, the DOF implemented national 
computer networking access to supercomput- 
ers. And more than 25 years ago, DOE labora- 
tories pioneered high-performance lixral net- 
work access and the client-server model of 
shared network services such as mass storage. 
I/O authentication, graphics, and terminal ac- 
cess. Because of this development and systems 
integration experience and because of its mis- 
sion to operate these forefront facilities, DOE 
is in a unique position to contribute greatly to 
the advancement of U.S. science through the 
networking component of the HPCC Program. 

The Federal HPCC Program includes a 
multiagency. multigigabit network research 
initiative. The DOE national laboratories have 
traditionally researched very high-speed net- 
working technologies in cooperation with U.S. 
vendors to provide high-speed data transfer for 
supercomputers and for expeument control 
systems applications. Mo* t recently, DOE has 



worked in collaboration with industrial part- 
ners to develop gigabit networks and interface 
technology that increase communications 
bandwidth by two orders of magnitude. There- 
fore. DOE also expects to contribute to the 
HPCC Program gigabit testbed projects— es- 
pecially with reg-ird to zigabit app' ations. 
protocols, and systems > oft ware. 

In recent years, tradii tonal theoretical and 
experimental technique:, ;n science and engi- 
neering have been augmented by a powe:?ul 
new technique; computational science. 1 o; 
term "computational saence" is used to de- 
scribe those intellectual activities in science, 
engineering, mathematics, and computer sci- 
ence that develop or exploit HPCC as an es- 
sential axil. While theory and experimental 
methods will not be replaced by computational 
science, they are being ^implemented by it in 
important ways. Currently, computational sci- 
ence educational programs are emerging in a 
small number of American universities. Hav- 
ing recognized the need for long-term invest- 
ment in basic research and education in com- 
putational science, the DOE national laborato- 
ries have long been leader* in developing 
computational science techniques and in train- 
ing limited numbers of postdoctoral research- 
ers, DOE intends to contribute further in this 
vital area at ail educational levels with a varied 
program, including teacher workshops in com- 
putational science, assistance with curriculum 
development, and access to high-performance 
computers for educational programs. 

The DOE also recognizes that, in order to 
bring the full benefits of the HPCC Program to 
the national industrial complex and other enti- 
ties concerned with proprietary or privacy in- 
formation, proper security measures must be 
included in the program from the start. The 
DOE national laboratories have decades of ex- 
perience in dealing with complex computing 
environments that process many levels of sen- 
sitive information. 

The following sections provide detail on 
the research initiatives proposed by DQE as 
part of the Federal HPCC Program. 



HIGH-PERFORMANCE COMPUTING SYSTEMS 



MASSIVELY PARALLEL COMPUTERS 



The Federal HPCC Technology Devel- 
opment Program in High Performance 
Computing Systems will be primarily 
performed and coordinated by 
DARPA. as described in the Grand 
Challenge report. 1 The DOE will participate in 
technology development, primarily with regard 
to evaluating advanced architectures for large- 
scale scientific and engineering applications. 

The DOE has long played an influential 
role in pushing the development of high 
performance computers. DOE applications are 
computationally and meir'ny intensive. They 
also push the state of the technology for 
hardware interconnects to supporting commu- 
nications, storage, and output devices. DOE 
has a natural interest in development of the 
foundations for the future generations of 
computing that will lead to advances in raw 
speed and capacity support for trillions of 
operations per second (teraflop) computing 
requirements. DOE also has an interest in 
encouraging the development of the next 
generation of machines so that architecture 
advances can be integrated into effective, 
usable computing environments. An important 
long-term goal that will guide DOE research in 
this area is to foster development of high 
performance computational facilities that arc 
architecturally balanced in a way to make them 
usable by general DOE applications. 

DOE is specifically interested in research 
that will lead to understanding the architectural 
limitations that will shape future machines, 
understanding how basic parallel software sup- 
port is impacted as the architecture scales to 
higher performance, and providing resources 
for research in developing basic components 
of future computing environments. 



Future generations of machines arc expected 
to rely heavily on architectural parallelism, 
Identifying and understanding the limiting fc> 
tors in parallel computing equipment is essen- 
tial in establishing realistic upper bounds on 
the requirements being placed mi machine de- 
sign and supporting technologies. Computa- 
tional models that require true teraflop solu- 
tion speeds will saturate the underlying hard* 
ware in many areas. We need to understand 
where these saturation points occur and seek 
to establish the means to overcome them. The 
DOE will conduct research to determine the 
effects of increasing parallelism in both hard- 
ware and software. Among the many issues to 
be addressed are the impact of code size on 
computational requirements, the effect of vari- 
ous languages and models of computation on 
the amount of usable parallelism made avail- 
able, the impact on reliability from increasing 
hardware complexity, the need for operating 
system features that optimize the use of the 
hardware, and the increased demands placed 
on supporting technologies. 

Massively parallel computers use an ap- 
proach to achieving speed that is radically dif- 
ferent from that used in today's production 
supercomputers. In massively parallel comput- 
ers, many hundreds or thousands of simpler 
and less expensive processors are coupled with 
a large amount of memory. These processors 
team up to carry out computations in parallel 
by breaking the problem into many subtasks 
that can be carried out simultaneously by the 
individual processors. Hie processors ex- 
change information as needed to complete the 
computation. In this way computations can be 
carried out many times faster on the massively 
parallel computer than on one of its individual 
processors. 



There are many different ways to link up 
processors and memory in massively parallel 
computer*. This adds to the flexibility of the 
computer but also complicates the job of com- 
puter design: system architects ate faced with 
many alternatives and choosing the " best" 
ones can be a difficult task and may be 
strongly application dependent. 

The current generation of massively paral- 
lel computers has been successf ully used to 
solve selected applications problems a* fast as 
or faster than the best vector supercomputers. 
Because of improvements in manufacturing 
technology, it is forecast that in two years 
these machines will calculate ten times taster 
than they do now; in five years they may be 
hundreds of times faster. 

To see why this speed is needed, consider 
an example of three-dimensional How emula- 
tion used for global climate or reservoir mod- 
eling. Current models, in which the lions are 
calculated by repeatedly evaluating equations 
millions of times, may not be sufficient!) ac- 
curate. Nevertheless, these calculations can 
take hundreds of hours on a conventional 
supercomputer. Accurate climate models 
could require solutions of equations hundreds 
of millions of times at each time step of the 
simulation. Completing such calculations 
would require* years of supereomputer time, 
f or weather forecasting, we need the data in 
hours. Massively parallel computers current!) 
being designed cwild do th „se demanding cal- 
culations in days; massively parallel comput- 
ers proposed to be developed under the HPCC 
Program could meet this need 



example, accurate three-dimensional flowfield 
simulations might produce as many as ten bil- 
lion potentially important bytes of data at each 
timestep. A typical simulation would involve 
thousand*; of timesteps. Thus, there might be 
as much as ten trillion bytes of data to he proc- 
essed from one problem. If this document had 
that much information in it, it would be a tew 
billion pages long! Of course, only a \ery 
small fraction of that data could be used in any 
human endeavor. Visualization enables us to 
represent information in a form that ef fectively 
uses human perceptive capabilities. 

NETWORKING 

To be effective. HPCC must be available to a 
large user population, not only in the national 
laboratories but in universities and industry as 
well. A goal of the HPCC Program is to dis- 
seminate this computing capability throughout 
American science, technology* commerce, and 
industry. This will require local and national 
networks with capabilities matched to the 
computing resources being accessed. Further, 
within a computing center, the networks to tie 
machines together w ith one another and with 
storage devices will need awesome hand- 
w idths — billions of bytes of information per 
second or more. 1 he technology to carry out 
this networking function is pan and parcel of 
ihe same technology in the computers that 
generate the data: last switches, high-speed in- 
terconnects, networking protocol* and ser 
vices, and increasingly sophisticated network - 
serv er computers. 



GRAPHICS AND VISUALIZATION 

The volume of data produced by such power- 
!ui computing machines w ill tax storage me- 
dia, input/output devices, and the networks 
that carry the data to the user. The only rea- 
sonable way to review the data is visually, for 



WORKSTATIONS AND PERSONAL 
COMPUTERS 

The impact of massively parallel supercom- 
puters will be complemented by ne* genera- 
tions of personal workstations \Mth unprece- 
dented power. This power will \piJI over into 
lhe personal computer i PC) market. The cost 
of low -end workstations is now comparable to 
ihat of well-equipped PCs. PCs are now ac- 
quiring work station -like functions. This trend 
will continue until there is little to distinguish 
ihe tun classes. It should not be surprising that 
workstation and PC will share in the power 
revolution: many of today's massive I > parallel 
computers use processors that grew out of PC 
an J ttorksiation processor technology . 

Today** personal workstations provide the 
power of yesterday's mini-supercomputer for 
a lew thousand dollars — a major achievement 
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MatVu* a matrix visualization 
package, researchers at the 
Center for Supercompuung 
Research and Development 
have been able to classify the 
convergence patterns of 
numerous eigenvalue ami 
singular value spectra hv 
assigning a togarithmuaUx 
scaled color table to the 
magnitudes of the off- 
diagonal elements m each 
sweep of the particular 
Jacohi algorithm. 

Parallel J acorn met funis 
(Ihyj- and one-sided i tm the 
Alliant FXfH and CRAY X 
MP/4H Itave been quite 
effective for computing the 
eigenvalues and eigenvectors 
cf rectangular matrices. 

The color table used in the 
images ramps from Mack 
(matrix elements of magni- 
tudes less :han or equal to the 
64-bit machine precision i 
through blue, green, yellow, 
and orange to red t matrix 
elements of largest magni- 
tude). Th* diagonal elements 
(yellow, orange, and redi m 
each of the matrices repre- 
sent improved approxima- 
tions to the exact singular 
values, with S A yielding the 
most accurate appro uma- 
Hon. 
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of American industry In two yean; workstations 
may have the power o! a CRAY Y-MP proces- 
sor on a variety of applications, Much of the 
work currently suitable only for supercomputers 
*ill be feasible on inexpensive workstations, 

COMPUTING ENVIRONMENTS 

An exciting aspect of the HPCC Program is its 
promise — at relatively modest investment — in 
revolutionize our approach to science, com- 
merce, and industry. The combination of mas- 
sively parallel supercomputer*, data manage- 
ment and visualization computers, ultra-fast net- 
works, and very powerf ul personal workstations 
will greatly increase our knowledge base and 
speed the rate of scientific progress. Similarly, 
ihe design process in industry will be enhanced: 
rapid prototyping will be a reality, the cost ot 
complex designs should decrease, and product 
quality should bene til greatly. Accessing im- 
mense commercial databases will become an in 
icgral step in business planning. To the extent 
that information and knowledge are the currency 



of tomorrow, the HPCC Program w ill provide 
a differentiating technology for America. 

DOE\ challenge is to help foster continued 
development of the diverse pieces of the 
HPCC Program, but especially io provide the 
expertise to glue the pieces into a seamless 
whole. 

>>,uH PERFORMANCE DATA STORAGE 
SYSTEMS 

High performance data storage systems need 
to be developed that are able in effectively 
serve an environment of massively parallel 
computers, general-purpose supercomputers, 
scientific workstations, and visualization. 

Orient the very important and very difficult 
challenges of this decade will be to satisf y the 
data storage and data access requirements in 
the HPCC environment. Current largo prob- 
lems that run on supercomputers generate 
from one to ten gigabytes ot data. This data 
must he saved and then be available lor quick 
access. With the current success in moving 
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edar. the experimental 
shared-memory multiprocessor 
system of the Center for 
Supercomputing Research and 
Development, has a scalable 
hierarchical structure. Clus- 
ters, each composed of eight 
processors, are connected to a 
g faf**/ memory by means of a 
very high bandwidth shuffte- 
exchange global network. The 
shuffle-exchange network is 
architect \ rally scalable 
because it has a fixed number 
of lines per I/O node, regard- 
less of network size. In addi- 
tion, the system exhibits 
scalability as a result of the 
power of the network and the 
hierarchical memory provided 
by the clusters. Cedar exhibits 
stable performance over a wide 
range of applications because 
of this two-level parallelism, 
with ami between clusters, fot 
fine and coarse grain parallel- 
ism, respectively. 
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he Advanced Computing Research 
Facility (ACRF) at Argonne National Labora- 
tory is committed to research on computers 
with innovative designs, with parallel archi- 
tectures as the principal focus of mention. 
The ACRF comprises a wide variety of 
advanced computers, ranging from a four- 
processor graphic supercomputer to a 16,000* 
processor massively parallel Connection 
Machine. 

This diversity cf machines enables re- 
searchers to conduct experiments on software 
portability among different computer architec- 
tures and to evaluate the suitability of different 
configurations, including shared memory and 
distributed memory, for specific applications, 
Tim machines are used in a wide range of 
research projects, including design of parallel 
algorithms, analysis cf languages for pro- 



gramming multiprocessor systems* and 
development cf parallel programming method- 
ologies for writing parallel p ro gr a ms. 

Established in 1984, the ACRF has become 
recognised as one of the worlds leading 
centers for parallel computing research. 
Approximately three hundred researchers from 
industry, universities, ami other government 
laboratories use the ACRF machines each 
month for studies in software and algorithm 
development. 

To encourage use oj the facility, the ACRF 
sponsors classes, workshops, and institutes on 
parallel computing* More than 500 scientists 
have participated in these activities, and many 
cf these researchers continue to use the ACRF 
machines remotely via national networks. The 
ACRF also has established two affiliates 
programs to promote the transfer cf research 
results to industry and academia. Currently, 15 
industrial affiliates and 27 university affiliates 
are using the ACRF advanced computers to 
conduct computational research at the leading 
edge cf technology. 

Work supported by DOE and 



large problems to massively parallel computers 
such a* the Connection Machine, the data stor- 
age and data access requirements have dramati- 
cally increased. A large problem on the Con- 
nection Machine will generate from tens of gi- 
gabytes up to a terabyte. To fully support the 
current state-of-the-art, large memory, mas- 
sively parallel supercomputer, it would be nec- 
essary to achieve data transfer rates of at least 
SO megabytes/second and a storage capacity of 
at least a hundred terabytes of data. As the mas- 
sively parallel machines become more power- 
ful, the data handling requirements will like- 
wise increase. 

To meet these data storage and data access 
requirements, it will be necessary to develop 
very high performance storage systems with ad- 
vanced data management capabilities. Such 
storage systems must be scalable to meet the in- 
creasing requirements and to maintain a bal- 
anced overall HPCC environment. 



ADDITIONAL DOE ROLES IN HIGH 
PERFORMANCE COMPUTING 

DOE continues, through its leading role in the 
computational sciences, to have a strong inter 
est in shaping the development of future HPCC 
environments. An important goal for DOE is to 
foster architecturally balanced approaches us- 
able by a broad class of applications. These ap- 
plications are the foundation for the basic re- 
search needs of many U.S. industries. 

Specific DOE interests include ( 1 ) issues as- 
sociated with the scalability of architectures 
and their impact on software, (2) research 
aimed at providing the resources needed for 
components of future systems, (3) reliability, 
and (4) integration of the components of this 
distributed computing environment. 

The DOE laboratories have a great deal of 
expertise in simulating advanced computer 
architectures. Hits capability will be important 
to investigations of the scalability of multi- 
processor architectures. It can also enable accu- 
rate prediction of the performance of important 
applications on new designs before they reach 
hardware production. Thus, design (laws can be 
remedied early in the product development 
cycle before changes become prohibitively 
expensive. 
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ottware and algorithms arc the keys in 
making HPCC ss stems both usef ul and 
economically successful, I'urthermoa*. 
for earlier generations ot computers. 



there is clear evidence that improvements in 
software and algorithms lead to larger perform- 
ance gains than did improvements in machine 
archlteclua , . However, the computing systems 



ot the past had relatively simple architecture*, 
making it possible lo largely decouple applica- 
tions algorithm and software development from 
the details of machine architecture. Today, the 
vimple archiiectua*s have been thoroughly ex- 
ploited. The complex architectures necessary to 
carry HPCC through ihe present decade require 
extensive development of new software tech- 
nology, numerical methods, and applications 
algorithms in order that their potential be real- 
ized. 



he Center for Computational Engineer- 
ing (CCE) at Sandia National Laboratories, 
Livermore was established to develop and apply 
computational technology to problems of 
national interest. Sponsored by the Department 
of Energy and private industry, the CCE is 
tasked with bringing together researchers and 
computer scientists from government and 
private industry to conduct mutually beneficial 
scientific research. 

Central to the CCE is the new generation of 
massively parallel supercomputers. To be able 
to use these massively parallel supercomputers, 
existing software must be rewritten in parallel 



form ami new numerical methods must be 
developed. Initial work will be with sponsors in 
four separate modules: global climate change, 
macromolecular design and environmental 
health software engineering, and field data 
management. 

Through the CCE's modules, Sandia will 
help corporate sponsors apply the new super- 
computers to problems in their own areas of 
expertise, in other wrds. the modular struc- 
ture will facilitate technology collaboration. By 
pooling the talents of scientists from industry, 
universities, and national laboratories, the 
CCE will automatically transfer computational 
technology, thereby helping sponsors reduce 
the oncept-to-design cycle and speed product 
introduction. The sponsors* industrial expertise 
will help focus the CCE's research on real- 
world applications. 



MODULES COMPUTERS 
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The CCE links engineering "modules" to evolving parallel processing technology. 

Work supported by DOE. 



Addressing this challenge requires a rethink- 
ing of disciplinary and organizational bounda- 
ries and an extraordinary teaming effort, inte- 
grated approaches, reaching ai! the way from 
computer hardware design to grand challenge 
applications, must be sought. In order to create 
the software and algorithms for parallel, distrib- 
uted, and hierarchical computing, computer sci- 
entists and applied mathematicians wilt need to 
better communicate with scientists in the.se im- 
portant applications areas. 

Most >! the Federal HPCC generic software 
technology development will be funded by 
DARPA. Because of the critical need for 
interagency coodination in this area, all partici- 
pating agencies will coordinate their advanced 
software technology and algorithms (ASTA) 
programs through the HPCCIT working group. 



SUPPORT FOR GRAND CHALLENGES 

The DOE. both as a natural consequence of its 
mission and specifically by virtue of its need to 
solve many of the grand challenge problems, 
has at its disposal a broad range of important 
applications on which to base a strong and ag- 
gressive software technology and algorithms 
support effort. To be successful, this effort will 
need to operate in close collaboration with 
grand challenge researchers so that a detailed 
understanding of applications will be incorpo- 
rated into the earliest stages of software and al- 
gorithm design and so that application code de- 
sign decisions will be matte in light of full 
understanding of feasible software and algo- 
rithm options. Grand challenge research teams 
will be specifically called on to test emerging 
software and algorithms and to recommend 
modifications and improvements. Early transfer 
of developed technology to the private sector 
will be vigorously pursued to provide both fi- 
nancial leverage and additional feedback on the 
effectiveness of the DOE-supported develop- 
ment efforts. 
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ioneering research at DOFs Sandia 
National Laboratories in parallel methods, 
applications, awl performance evaluation on 
a 1024'proctssor NCUBEften hypercube won 
the Karp Challenge and the inaugural 
Gordon Bell Award tit 1988, as well as an 
R&D 100 Award in 1989. This research 
demonstrated 1000-fold parallel speedups for 
three full-scale scientific applications on a 
commercial parallel processing system and 
introduced three new concepts of perform- 
ance modelling: scaled speedup, fixed-time 
speedup, and operation efficiency* Sandia was 
given a second RAD 1 00 Award in 1 989 for 
the Undo Project, its innovative research in 
parallel distributed computing using a large 
number of networked heterogeneous 
computers. 

Wo* support* by DOE. 



Energy Conservation and Fossil Fuel 
Combustion 

For the foreseeable future, 90% of the energy 
needs of the United States will be met by the 
combustion of fossil fuels. Fossil fuels are 
burned in stationary combustion chambers, for 
example, for electrical power generation, and in 
mobile forms such as in automobiles. 

Automobile engines are most efficient when 
run at hig** temperatures (Carnot effect)* but in- 
creased temperature leads to increased nitrogen 
oxide emissions. The burning of alternative fu- 



if; 



ds such as methanol is complicated by the 
emission of formaldehyde, a known carcino- 
gen, into the atmosphere. Pollutants are af- 
fected by local geology and climatic conditions, 
making it necessary when seeking solutions to 
lake into account the total system of fuel, en- 
gine* ami atmosphere* Our environment is too 
delicate to be used as a testbed: therefore, we 
must u?e supercomputers to simulate the at- 
mospheric effects before experimenting. 



An important example of the application of 
DOE supercomputer methodology is m the de- 
velopment of an engine design code that is fully 
three dimensional and includes models of 
chemical reactions, fluids, gases, and panicu- 
late formation. The code is in a constant state of 
evolution and is designed to handle the most 
complex ermines, such as the stratified charge 
engine and thr two-stroke engine. 
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detailed understanding of fuel burning 
inside an internal combustion engine requires 
the comprehension of a number of complex 
processes. One example is the Direct-Injection 
Stratified Charge (DISC) engine, which 
requires high performance computing for its 
optimum design. DISC is an experimental 
engine design. Its goal is to run leaner and 
cooler than conventional engines* thereby 
reducing emissions while increasing the 
compression ratio ami control of ignition for 
greater efficiency and fuel economy. The 
burning of the fuel-air mixture in such an 
engine brings into play a few hundred different 
chemical reactions between numerous short- 
and long-lived chemical species. The rates at 
which these reactions occur depend on the 
temperature and concentrations of the species. 
For instance, as combustion proceeds, various 
pollutants may be formed, such us nitrogen 



oxides iNQ x ) ami unturned hydrocarbons. 

These hydrocarbons may form solid particles 
known as soot. Also, the fuel may autoignite in 
some region before the flame arrives, giving 
rise to the small explosion known as engine 
knock. 

Because engine operation is so complex, 
designing an optimum engine by experimenta- 
tion f varying all design parameters independ- 
ently) is prohibitively expensive, A computer 
model of the engine allows us to effect design 
changes numerically on a scale of hours at a 
fraction of the cost. 

The engine components depicted in the 
illustration are a cupped piston, fuel injector, 
ami spark plug. The piston may travel at speeds 
of up to 90 miles per hour. It draws in fresh air 
and expels exhaust gases as the combustion 
cycle progresses. The air and exhaust gases 
flow at approximately the piston velocity. At an 
appropriate point in the cycle (upper right in 
the figure), liquid fuel is injected into the 
engine cylinder. The fuel jet breaks up into 
small droplets, which interact with the air. The 
droplets alterrately elongate and flatten; some 
of them break up into two new droplets, others 
collide and coal *sce to form larger drops. Ti e 
spray is caught up by the air flow and swirls 
around in the cylinder. It simultaneously 
evaporates to form a gaseous fuel/air mixture. 

Shown on the right is a computer simulation 
depicting four different times during the 
combustion cycle of a DISC engine. The multi- 
colored surface indicates the fuel's location. 
When the spray has e operated sufficiently a 
spark ignites the mixture, and combustion 
hsgins as a flame propagating through the 
chamber. As ihe engine proceeds through its 
cycle (indicated by the increase in crank angle 
from -23 to +17 degrees), the fuel burns, 
resulting in high temperatures (see color code). 
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l;ach of the models w ithin the code repre- 
sents an approximation to the facts, sometimes 
because the facts are not well understood but 
often because the capabilities of existing com- 
puters do not allow such information to be in- 
cluded. For example, the over 4tX) chemical rate 
processes involving hydrocarbon and nitrogen 
chemistry are treated globally by less than ten 
*>ueh reactions in order that the code run in a 
tew hours on a large supercomputer. Since, 



however, the 4tX> reactions are known from first 
principles, the problem is addressable; what aa* 
required are better algorithms running on a ma- 
chine HMXX) times more powerful, a teratlop 
machine. 

ITits computational design technology is used 
by many private industrial engine design tirms 
as well as universities and government laborato- 
ry*. 



Present engine models are deficient in one 
way or another because of limitations in 
computing resources. Computer codes, such as 
KIVA. can provide nf or mat ion on large scale 
fluid flows, but are limited in their ability to 
describe the small scale flow structures in 
turbulent flow. Furthermore, fuel chemistry 
models are necessarily crude. Current chemical 
kinetics calculations approach the detail 



necessary for an adequate description of the 
chemistry, but even these models require serious 
improvements. Such calculations are capable of 
only the most rudimentary fluid mechanical 
effects. 

The envisioned increase in computing power 
is critical to the effective development of clean, 
efficient engines. 

WteksworttbyOOE. 
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global ocean model running an the 
Los Alamos CM-2 is the first realistic general 
circulation model to be implemented on a 
massively parallel machine. The code was 
rewritten for the CM-2 based on the Cray 
version of the Semtner-Chervin global ocean 
model, which is highly parallelized to run on 
the multiprocesso, CRAYX-MP or Y-MP. For 
large problems, the CM-2 code runs with 
speeds comparable to a full 8-processor Y-MP. 
New algorithms were implemented to substan- 
tially improve the code's performance. The 
model is a finite difference code with realistic 
surface ami ocean-bottom topography. 

To fully resolve the complicated currents 
and circulation in the ocean requires grids 



with spacing substantially less than SO 
kilometers. The largest calculations that have 
been run to date with the Cray version used 
grids with 55-kilameter resolution. With the 
current CM-2, it is parable to solve problems 
with a 211-kilometer resolution. The ultimate 
aim in global climate studies is to develop an 
advanced climate model capable of describing 
the fully coupled ocean/atmosphere system, 
which will be necessary for solving important 
problems, such as predicting the effect of 
greenhouse gases on global warming. 

The image shows the observed temperature 
(at 160 meters depth), which is used in model 
calculations. The warmest areas, in the 
equatorial region, appear red; the coolest 
areas, near the poles, appear orange and 
yellow; and the areas in between are shown in 
Mm ami green. 

Wort* supported by DOE 
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Global Climate Modeling 

It i* paramount thai we improve our understand- 
ing of global climate and its potential impact 
upon human activities and the environment, 
t Jeneration of carbon dioxide, methane, and 
other grecnhou.se gases as a by-product of en- 
ergy production and other activities must be bet- 
ter quantified, and our capability to predict their 
influence on climate must be greatly enhanced if 
we are to formulate rational energy strategies 
and policies. Numerical computer models pro- 
vide the only means for projecting the impact of 
greenhouse gases on future climate. Present-day 



global climate models are able to simulate sat- 
isfactorily some aspects of the current climate, 
but comparison of future climates predicted by 
various models reveals significant and discon- 
certing disagreements. 

A new generation of models that have liner 
spatial resolution and more realistic treatment 
of the critical physical and chemical processes 
that control our climate is needed. In addition, 
longer computations spanning many decades of 
simulated time (200 yean? or morel will be 
needed. Current models, however, are con- 
strained bv the resources of even the lareest 



1. 



supercomputers now available. CHAMMP — 
the Computer Hardware, Advanced Mathemat- 
ics, and Model Physics Climate Modeling 
Program — is a DOE program specifically 
inteded to develop within 10 years advanced 
climate models with the above-mentioned 
improvements and also capable of much longer 
simulations. 2 The advanced computer systems 
and software technologies being developed by 
the DOE HF~C Program will provide capabili- 
ties critical to the success of CHAMMP and to 
solving the nation's environmental and energy- 
related problems. 

Biosciences 

Biological research plays a major role in im- 
proving the quality of our lives. Computer use 
is, currently, a major activity of structural biol- 
ogy research and will become increasingly im- 
portant. There are two categories of use: ( 1 ) 
data acquisition and analysis and (2) modeling 
and theory. Also noteworthy are the efforts be- 
ing made to map and sequence the human ge- 
nome — the full set of instructions for a human 
being. Such research is crucial if we are to de- 
termine the genetic basis of many human dis- 
eases. 

Currently one of the principal uses of com- 
puters is for data acquisition ami analysis. Effi- 
cient use ot high-intensity photon and neutron 
beams provided by existing facilities demands 
online computer control of beams, spectrome- 
ters, and detectors. Data acquisition rates from 
diffraction and nuclear magnetic resonance 
(NMR) experiments are high* and analysis of 
data using established algorithms, an essential 
component of mast structural biology research 
sponsored by the DOE Office of Health and 
Environmental Research, is very computer in- 
tensive. Current computer resources are not 
adequate to support the efficient use of existing 
facilities. Modeling of macromolecular dynam- 
ics for crystailographic data refinement and 
electrostatic field calculations for determination 
of structures in aqueous solution are extremely 
demanding of computer power and time. These 
types of simulations are important in that they 
can provide details of processes that cannot be 
easily probed by experimental techniques, such 
as the relaxation of water molecules in the vi- 
cinity of biological molecules. 

In the future, efficient use of high flux 
beams at High Flux Beam Reactor and National 
Synchrotron Light Source and the advanced ca- 
pabilities of the Manuel Lujan Jr. Neutron Scat- 
tering Center, the Advanced Light Source, the 
Advanced Photon Source, and the Advanced 
Neutron Source will require application of ad- 



vanced and increasingly computer-intensive 
data acquisition ami analysis protocols. Re- 
quirements for computer-intensive molecular 
dynamics and electrostatic calculations will in- 
crease as more structural biology data is ac- 
quired and analyzed. Development of parallel 
processing and computer graphics will be re- 
quited to support online structural analysis and 
computer-based search strategies for sequence/ 
structure or structure/function correlations. 

The human genome initiative requires algo- 
rithms for comparing the "fingerprints" of 
pieces of DN A that are known to be derived 
from a larger piece of DN A but whose degree 
of overlap and spatial ordering relative to one 
another is not known. Determining ihe*e rela- 
tionships is highly computationally intensive 
and stretches the an of combinatorics. Mote- 
over, the requirements for this work are far 
from static; the quantity of sequence data has 
been doubling every year or two. If the human 
genome initiative is successful, the quantity of 
data 15 years from now will be 100 to 1000 
times the current amount. The sequencing ef- 
fort will require computational power, unprece- 
dented in the realm of molecular biology, for 
improved data management and analysis, and 
communications links between those generat- 
ing, managing, and accessing the experimental 
and interpretive data. 

Materials Sciences 

Many of the problems associated with materials 
research require computation of the subtle 
many-body effects that manifest themselves in 
high-temperature superconducting materials 
and the properties of complex polymers. Chal- 
lenges in the realm of computational physics, 
chemistry, and engineering include ( 1 ) eluci- 
dating the mechanisms of chemical catalysts, 
the design of new catalysts, and the design of 
new materials for separations science; (2) 
understanding the electronic properties of novel 
materials, such as high-temperature supercon- 
ductors, polymers and synthetic metals; (3) ex- 
tending the empirical molecular mechanics ap- 
proaches currently employed in the pharmaceu- 
tical industry for the design of drugs; and (4) 
designing new materials and alloys with novel 
properties based on an understanding of the in- 
teraction among the material constituents at the 
molecular level. Initial small applications of 
these challenges tax the capability of today 's 
most powerful computing technologies and will 
help to influence their future directions. 
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Ai/f experimental techniques such 
as two-dimensional NMR spectroscopy can 
provide detailed information about the relative 
distances between atoms in biological mole- 
cules, the translation of this information into 
three-dimensional structural information can 
be enhanced considerably by the use of 
molecular simulations. A recent study of 
interactions in drug-DNA complexes, per- 
formed at the Los Alamos National Labora- 
tory, was carried out to construct a set of 
models consistent with two-dimensional NMR 
data. The particular case involved a complex 
interaction shown in the figure between a 
specific anticancer drug (Distamycin-2) and a 
segment ofDNA. The atoms of the anticancer 
drug are represented by the small cyan 
objects. The swface of the DNA is indicated 



by larger atoms, where different colors 
represent different bases: blue is guanine, 
green is cytosine, yellow is adenine, purple is 
thymine, red represents the charged phosphate 
groups, ami white represents sugar. 

A molecular dynamics simulation followed 
by energy minimization showed that there are 
actually three distinct structures consistent with 
the NMR data and that in two of these struc- 
tures the DNA molecule is bent significantly. In 
a nonlinear mode of motion shown by the drug- 
DNA complex, each of the three distinct 
minima is sampled every 5 ps (lfr i2 si. The 
combination of two-dimensional NMR and 
computer simulations gives a much clearer 
interpretation of the structure and dynamics of 
drug-DNA complexes than would be obtained 
by simply attempting to fit the experimental 
data. Calculations of this type can also provide 
guidance for future studies of protein-DNA 
interactions at the atomic level. 

Wort supported by DOE, 



key aspect of designing 
improved materials is the understanding 
of their response to various kinds of 
dynamic loading, including fracture 
properties {such as the yield stress and 
the mode of fracture). Given a model for 
the interatomic forces, the fracture 
process can be simulated using molecular 
dynamics (MD) techniques, provided that 
the trajectories for a macroscopically 
large sample of atoms can be followed 
for a sufficiently long time on a com- 
puter. Simulations af*I0 12 atoms (i.e., a 
block lymon edge) are required to 
properly model the effects of grain 
boundaries, plastic deformation, and 
crack propagation. Until very recently, 
the largest MD simulations have involved 
IV to 10* atoms, so that no realistic 
calculations of fracture in three-dimen- 
sional solids have yet been attempted. 

An exceptionally fast MD computer 
code is being aeveloped to exploit 
massively parallel architecture of the 
Connection Machine (CM-2). It will 
perform simulations on >l& atoms, 
corresponding to a two-dimensional 
square of material approximately 025 
\m on edge. Many of the features of 
three-dimensional fracture are retained 
in two-dimensions, so that these simula- 
tions will make the first direct connec- 
tions between the atomic interactions and 
the macroscopic fracture properties. 

Shown here is a sequence of snapshots 
from a sample run of a spall process, 
using **30 K atoms, interacting with a 
many-body potential appropriate for a 
metal. The locally averaged horizontal 
velocity is represented as a continuous 
color rainbow. In the top frame, the flyer 
plate i shown at the left in blue) is moving 
to the right and is about to collide with 
the sample (shown in greenish-orange 
moving to the left). The relative velocity 
of collision is 25% of the metal sound 
speed. The resulting shock waves move 
through the material, which ultimately 
fractures at the point where two rarefac- 
tion waves meet. Significant plastic work 
results in a damaged spall fragment 
exiting to the right. 
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ince the 1986 discovery of a new class 
of superconducting ceramic oxides, scientists 
around the world have been involved in 
intensive research efforts to understand and 
fabricate practical high-temperature sup ere on 
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ductors. These efforts to model the structural, 
vibrational, and electronic properties of 
matter can be greatly assisted by powerful 
computers, especially since parallel super- 
computers and sophisticated algorithms can 
now be combined to provide sufficient 
processing power. 

Physicists ami computer scientists at Oak 
Ridge National Laboratory (ORNU have 
collaborated to develop a multipurpose 
parallel computer code to calculate the 
electronic structure of real materials from 
first principles based on the Korringa. Kohn, 
and Rostoker coherent potential approxima- 
tion (KKR-CPA) theory of magnetism and 
alloys. The model allows multiple atoms per 
unit cell ami is ideally suited for situations in 
which substitutional disorder plays an 
important role, such as in high-temperature 
superconductors. The code executes effi- 
ciently on serial computers, on shared- 
memory multiprocessors such as the CRAY Y* 
MP, and on distributed-memory multiproces- 
sors such as the Intel iPSC/860, and a 
distributed network of computers. 

Initial computational experiments on the 
high-temperature perovskite superconductors 
Ba } K % 0, and BaPh } Bip ^—performed on 
the 8-processor CRAY Y-MP at the Ohio 
Supercomputer Center and the 128-praiessor 
Intel iPSC/860 at ORNl^ltave revealed 
several interesting details related to alloy 
softening of the density of states and Fermi 
surface nesting. Moreover, these results 
revealed that experiments on more complex 
superconductors such as iLa, M Sr t tCuOjvill 
be able to execute at a rate of over 2 5 G flops 
on the Intel iPSC/860. 



Hiek Pfrtomatne i immuune 



ERIC 



Plasma Physics Research 

The successf ul development of magnetic 
fusion or inertia! fusion as an alternative 
energy source requires a deep and detailed 
knowledge of plasma phenomena. Numerical 
simulation has made many contributions io the 
basic undemanding of plasma phenomena, 
i )ne can poinf to the discovery of nonlinear 
effects in space plasma physics, inertial 
confinement physics, and magnetically 
confined plasmas that came about through 
numerical simulation. The problems are 
typically complex with many competing 
effects acting simultaneously on different time 
and space scales, anw are inaccessible to 
experimental or analytical approaches. 
Typically, simulations have given insights into 
fundamentally new physical processes that are 
striking in the ways that nonlineanty can hnng 
order out of complexity. Understanding plasma 
phenomena depends on the ability to do 
simulations in three dimensions wnh realistic 
geometries and long time scales. The single 
most important component for continuing 
progress is the attainment and successful 
application of greater computing power. ITiere 
i\ a present and urgent need to provide realistic 
plasma simulations in three physical dimen- 
sions involving complex hounded systems. 

Another area with potentially important ap- 
plications involves the interdisciplinary cou- 
pling of molecular physics with plasma sci- 
ence. Tor example, to be able to model the syn- 
thesis ol novel materials using chemical vapor 
deposition tCVDi. one would need to combine 
the computational techniques of molecular 
physics wnh numerical plasma simulations. 

Fundamental Physics and QCD 

Held theones such as quantum electrodynam- 
ics and quantum chromodynamics tQCD) are 
heiieved to describe the entire range of 
physical phenomena on the atomic, nuclear 
and subnuclear scales. QCD is the theory of 
strongly interacting panicles such as nucleons 
and mesons. The highly nonlinear regimes of 
these theories cannot be solved with traditional 
techniques of quantum mechanics. However. 
m\ approach in which space-time is approxi- 
mated by ii lattice makes it possible to predict 
the characteristics of strongly interacting 
particles. The present calculations are forced to 
make severe approximations. With the 
continuing development of new computational 
techniques a> applied to novel computer 
architectures it is possible to treat these more 
exactly. The goal of these calculations is to 
leam how quarks and gluons form composite 




hown above is a visualization of the 
pion propagator based on a QCD lattice 
generated on the Connection Machine, The 
propagator is a function of three spatial 
dimensions and one time dimension, so the 
data have been averaged over the third 
spatial dimension and displayed as a function 
ofx and v f the short axes) and time (the long 
axis). 

The event represented is the creation of a 
pion near the center of the volume ami its 
propagation in space both forward and 
backward in time. The magnitude of the 
propagator determines the size of the 
"hubbies" (shown in green) in this visualiza- 
tion, ami a representative surface of constant 
amplitude is displayed in white. From the 
rate at which the amplitude dies out as a 
function of time, the pion mass can be 
calculated. The generation of the lattice 
requires approximately 3(H) hours on a IbK 
CM-2: many such lattices are required to 
give a statistical estimate of the pion mass by 
averaging results from the propagator 
calculation based on each lattice. 
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ne cf the more complex plasma 
problems to be solved is the modeling cf 
plasma confinement by magnetic fields in a 
Tokamak. This modeling requires a full three- 
dimensional simulation of the diffusion of a 
muiticomponent plasma in the twisted toroidal 
fields. The accompanying picture shows output 
from a three-dimensional finite difference 
adaptive mesh simulation. Shown is an interior 
surface and a cross section cf the Tokamak 
with the color cf the surface proportional to 
the strength of the magnetic field. The use of 
color allows the display of the field strength on 
multiple surfaces through partial transparency 
of one cf them. The magnetic field decreases 
from the inner edge cf the torus {yellow) to the 
outer edge (blue). Color shading enables one 
to display multiple complex features in three- 
dimensional computer models. 



Within the Tokamak is depicted the trajec- 
tory cf a single charged particle (indicated by 
the white lineh The orbit is calculated from the 
magnetic field data stored on the three- 
dimensional mesh. The orbit generally follows 
a magnetic field tine, while precessin* slowly 
due to gradients in the field. This calculation is 
only a start on this exceedingly complex 
problem. Diffusion is a result of the drifting 
motion in this magnetic field ami the self 
consistent electric fields generated inside the 
plasma. A realistic simulation of a Tokamak 
will require a computation mesh with a million 
zones ami several million particles. Edge 
effects, plasma heating from beams, and wave 
heating, as well as the internal diffusion 
process, must be taken into account. 
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structures as will be studied experimentally at 
Re I an vis tie Heavy km Collider and Supercon- 
ducting Super Cotiider. 

QCD is among the most computer-intensive 
problems known. One reason is because full 
QCD is a very complicated field theory, with 
many degrees of freedom. Today's besi 
calculations are being done with 16x16x16x32 
lattices. It is thought that definitive results will 
not be obtained until we can calculate with 
256x256x256x256 space-time sites and each 
site requires nearly 100 floating point words to 
describe the fields. That is around 400 billion 
words of computer memory required just to 
hold the description of the lattice. Furthermore, 
nearly all arithmetic operations in QCD involve 
complex arithmetic, so the number of opera- 
tions per word of the lattice is high. Extracting 
these results will require computers thousands 
of times more powerful than today's fastest 
supercomputers. 



Emdroamerital Modeling and Remstilatlon 

A comprehensive mathematical modeling and 
computational icseaich program is needed in 
the area of restoring the earth's surface and 
subsurface environment after contamination. 
Problems in this area are of immense impor- 
tance to DOE and are found in every region in 
our country. Examples include leaky radioac- 
tive or chemical storage containers under- 
ground (gas stations, nuclear waste sites, etc. ) 
and chemical spills on the surface toil spills in 
the ocean, chemical processing plant spills, 
etc.). Computer models can help provide much 
needed understanding of the complex physical, 
chemical, and biological processes involved, 
and detailed simulations are effective substi- 
tutes for experimental laboratories that can be 
used to test understanding of complex phenom- 
ena and supplement physical intuition. In many 
cases computer simulations are the only fea- 
sible method of studying the phenomena due to 
the long time scales involved in the radioactive 
and chemical processes. 




sing the heat generated by an electric 
field to melt a region of contaminated soil is the 
first sup of the Tn-situ vitrification" process 
currently under study for removing hazardous 
waste from DOE sites. The soil resolidifies as a 
glass, effectively trapping the contaminants and 
preventing their migration into the groundwa- 
ter supply. An experiment testing the procedure 
has been conducted at Oak Ridge National 
Laboratory. The glassified soil am be seen 
between the four electrodes in the test configu- 
ration shown in the; hotograph. 

Computational modeling is essential to 
better understand and control this process. 
DOE researchers are developing algorithms 
fin- modeling the many complex processes 
involved in the vitrification process: heat 
generation and transfer* natural convection, 
liquid} solid phase change, chemical interac- 
tions, and the creation and movement of gas 
bubbles* As the process takes place under* 
ground, computational models must also be 
used to interpret the data obtained from 
necessarily indirect experimental measuring 
techniques. 
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Computer studies are essential for the devel- 
opment of new technologies for identifying, 
monitoring, and reclaiming hazardous waste 
Mies and spills. Many such technologies arc 
emerging* such as in situ vitrification, bioreme- 
diation. organic vnlitali/ation. and surface and 
subsurface hydrology. Some, no doubt, will fail 
to live up to their potential but others may pro- 
vide solutions that significantly improve the 
process. Even a small improvement can yield 
tremendous savings given the magnitude of the 
problem facing our environment today. Com- 
puter simulations will help 10 determine which 
technologies should be vigorously pursued and 
under what conditions they should be utilized. 

Current models incorporating relatively few 
chemical, biological, and physical interactions 
run ai 20 to 40 Mttops on a single CRAY . 
MP processor and three-year simulations lake 
on the order of 10 hours of CPU time and over 
a day of actual %all clock time. When addi- 
tional interactions, such as inclusion 4)* more 
important chemicals in the reactions, cracks in 
subsurface rtxrk formations, and the potential 
exponential growth of biological species, are 
incorporated along the required resolution to 
resolve the chemical reactions at the fronts, 
computational requirements quickly surpass the 
capabilities of today's supercomputers, even 
for small three-year simulations. Because of the 
time scales involved, simulations uf ten to a 
few hundred years are required, w hich further 
enlarge the computational power shortage. 

SOFTWARE COMPONENTS AND TOOLS 

In addition to a need for software tailored 
specifically for grand challenges, one of the 
HPCC Program objectives is to advance 
generic software technology that will have 
broad national impact. DOE plans to form 
collaborative groups among the national 
laboratories, universities, and industry to 
develop and share in this software technology 
effort, This area includes a broad range of 
activities, a selection of which are described 
below. 

Massively Parallel Software Systems 

Massively parallel computers will not displace 
traditional supercomputers for the majority of 
users until reliable, multiuser operating systems 
are developed. Such operating systems are 
needed to ensure efficient machine utilization 
and should be developed along with the 
hardware to ensure adequate hardware support 
for operating system functions. The operating 
systems that are currently available for parallel 
computers are still primitive compared to 



UNIX and MACH. Although a multitasking, 
timesharing system might eventually be 
desirable, the initial operating systems may 
assign each user a subset of the total number of 
nodes on a massively parallel machine. Using 
this approach would avoid problems associated 
uith load balancing while timesharing occurs 
on some fraction of the nodes. The operating 
system must also ensure the integrity and 
security ot each application being executed on 
the computer and allow operators to resort all 
of the applications after an inadvertent shut- 
down. A very important issue is the ability of 
an operating system to ?llow input/output to 
occur in parallel. Without such a capability, an 
input/output bottleneck can form, limiting the 
utilization of the machine. It is important to 
capitalize on recent advances in a universal 
parallel, distrihuted-memory operating system 
such as parallel MACH. which will have a 
small kernel and a large library' of server* 
containing most of the UNIX functionality. 
This will allow streamlining the operating 
system, providing only those services on 
individual nodes nee'ied by a given job. The 
operating system must provide functionality, 
tying kernels on individual processors together. 
This should include parallel file systems that 
can be smoothly created and accessed from 
several cooperating processors. The advantages 
to both manufacturers and users of a universal 
operating system are obvious: leverage of 
resources and uniformity of environments. 

Partible and Scalable Libraries 

Mathematical software has played a vital role 
in making uniprocessors effectively usable by 
scientists and engineers. As more elaborate 
computer architectures arise, the need for more 
sophisticated mathematical software becomes 
acute. Ideally, such mathematical libraries 
should be portable across a variety of architec- 
tures, and scalable libraries for a single 
architecture will be the first step in this 
evolution. To ensure portability across ma- 
chines, one may adopt a language for express- 
ing vector and parallel constructs. *uch as an 
extended version of Fortran. It is clear, how- 
ever, that such a common language alone 
cannot guarantee "performance portability." 
Until compilers for parallel machines with 
vector or RISC processors mature, avoiding 
substantial degradation across architecture* will 
still rely on ( 1 ) tuning code by inserting 
compiler directives, and (2) using basic 
primitives that are written in assembler for 
maximum exploitation of architectural features 
and that are a part of the mathematical library 
of a given machine. To ensure performance 
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portability tand avoid degradation of perfor- 
mance*, one needs numerical libraries in which 
the algorithms make effective use of the 
primitives that ideally match the architectural 
feature for maximum performance. For 
example, scalability of the algorithms used in 
these libraries can be enhanced by scalable 
primitives. This has been demonstrated, to a 
certain extent* in libraries that deal with dense 
matrix computations. However, much remains 
to be done in other areas, such as sparse matrix 
computations. 

Programming Languages/Compilers 

Effective use of high performance computers 
depends heavily on both the programmer's 
ability to express algorithms in a programming 



language clearly and succinctly and the 
associated compiler's ability to exploit the 
target machine's architecture. There is exten- 
sive scientific debate on the most cost-effective 
language/compiler strategy for parallel arch 
lectures. The principle options are 1 1 ) extend 
existing languages with parallelism primitives. 
i 2) advance compiler optimization technology 
to find parallelism in existing languages, and 
i M develop functional, object-oriented lan- 
guages to replace the old styles. The DOE 
laboratories will collaborate with interested 
university and industrial partners to use their 
extensive expertise to push the technology of 
ail three options. They will also establish fair 
comparisons among the options to determine 
their relative merit. 




AO 
35 
30 
25 
20 
15 
10 
5 



Reduction to Hessenberg Form; 

BLAS3 I 



Fortran 




258 384 512 $40 768 896 1024 
n 



evelopment of Basic Linear Algebra 
Subroutines (BLAS3) was based on the idea 
that the addressing locality of a program can 
be greatly enhanced by property indexing a 
set of nested loops. For example, indexing 
through whole rows and columns can cause 
many more cache misses than repeatedly 
indexing through small rectangular blocks 
that fit in cache memory. In the case of the 
Aitiant FX -8. cache memory misses caused 
performance to fall far short of expectation. 
With BLAS3, performance increased to 
almost peak achievable system performance, 
which is sustained for problems that are not 
cache contained. Performance rates of 
approximately 50 megaflops have been 
observed for the rank-k update BLAS3 
primitive C*-C+B. The performance of the 
block methods that use these primitives is 
also quite high. Rates between 40 and 45 
megaflops have been observed for a block 
LU decomposition algorithm. 

Top: The performance of an LU factoriza- 
tion implemented in Fortran, using optimized 
BLAS2 primitives and BLAS3 primitives. 

Bottom: The results of the BLAS3-based 
code for reducing a matrix to upper Hessen- 
berg form and the Eispack Fortran routine 
ORTHES after automatic optimization. 

Wo* supported by DOE. NSF, AF0SR and IBM. 



<*ad balance is a 
critical issue in porting 
applications to massively 
parallel computers. At 
DQE's Sandia National 
Laboratories, a dynamic 
synchronous load balance 
met iod based on binary 
decomposition was used to 
balance me mil 9; on par- 
ticles in six seconds on a 
I024*processor NCUBF'ten 
starting from a random 
assignment cf particles to 
processors: subsequent load 
balance updates required 
less than 0J second each. 
Using heterogeneous 
programming in a radar 
simulation application, six 
user programs — a host 
program, dynamic sched- 
uler, ray-tracer, radar 
simulator, image collector, 
ami graph 's program — 
execute simultaneously and 
cooperatively on the 1024- 
processor NCU BE 2. The 
dynamic scheduler per- 
formed asynchronous load 
balancing based on a 
processor hierarchy: the 
resulting code ran about 50 
times faster on the NCU BE 
thanonaCRAYY-MP 
processor. This scheme has 
been recently applied to SDl 
tracking problems and other 
complex simulations. 

Wort supported by DOE. 
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demists at Argonne National 
Laboratory played a leading pari in the 
design of Strand, a parallel programming 
system distributed by Strand Software 
Technologies ofBeavcrton, Oregon. 
Recently, members of Argonne* s Mathemat- 
ics and Computer Science Division helped 
the Aerospace Corporation of El Segundo. 
California apply Strand to engineering 
problems. They also helped the corporation 
develop a set of tools for understanding the 
performance of parallel programs. 
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77ie parallel processing group at the 
Aerospace Corporation originally contacted 
Argonne because they needed parallel pro- 
gramming tools to support parallel versions of 
codes used in their engineering programs. They 
were also interested in extending tools that they 
had developed for understanding the perform- 
ance of sequential programs to work in a 
parallel programming system. 

Argonne scientists helped Aerospace 
Corporation develop their tools into a form 
suitable for parallel performance evaluation. 
The result of this work is a powerful data 
collection and analysis system called Gauge. 
Gauge can be used on a wide variety of 
parallel computers, including hypercubes and 
shared-memory machines. Engineers at 
Aerospace Corporation are now using Strand 
ami Gauge to develop parallel versions of 
target tracking ami rocket design codes. At 
Argonne, the Gauge tools have proved 
invaluable in projects developing parallel 
codes of climate modeling and computational 
biology. 

The group at Argonne also advised Strand 
Software Technologies on how to incorporate 
the Gauge tool into the Strand programming 
system. Strand Software Technologies agreed 
to distribute Gauge free of charge with the 
Strand software system, providing wide 
distribution of the tools developed by the 
collaboration. 

The different colors in the image represent 
the amount of work being performed in 
different parts of the parallel program. Black 
represents the most work, whereas red, blue, 
ami green represent less work, in that order 
(see color bar). 

Wort supported DOE 
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(reams and Iterations in a Single 
Assignment Language (Sisal) is a general- 
purpose functional language far high* 
performance computing. The language is 
intended to run on both conventional and 
navel multiprocessor systems. The lan- 
guage's development is a collaborative 
effort by Lawrence Livermore National 
Laboratory and Colorado State Uaversity. 
Currently, researchers at more than twenty 
academic and resear h institutions world- 
wide are using Sisa*. 

Functional languages provide a clean 
and easy-to-use parallel programming 
model that facilitates parallel program 
development and simplifies compilation. 
The semantics of functional languages 
isolate the user from the complexities of 
parallel programming. The compiler and 
operating system—not the use r ar e 
responsible for the synchronization, 
communications, and scheduling cf concur- 
rent tasks. Functional languages free the 
user to concentrate on the nature cf the 
algorithm and not its execution. 

Until now, functional semantics carried 
a high performance cost; but recent 
advances in compiler and operating system 
technology have brought Sisal on par with 
Fortran on both scalar and vector shared- 
memory multiprocessors. Given the 
expressive and easy-to-use parallel pro- 
gramming model it provides. Sisal is an 
attractive alternative to conventional 
programming languages on shared-memory 
multi-processors. 

1 Processor ! 5 Processors 
Program # Lines Sisal Fortran i Sisal Fortran 
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Weather Cote 
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This table, a comparison cf Sisal and Fortran 
execution times, shows the execution times tin 
seconds) of four scientific programs on the All ions 
FX/80. Clearly, Sisal is as fast as or faster than 
Fortran. 
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esearchers at the University of 
Illinois' Center for Supercomputing 
Research and Development have pioneered 
compiler techniques to detect implicit 
parallelism in sequential languages, such as 
Fortran 77. Today, these compiler tech- 
niques are used routinely inmost 
supercomputers. For algorithm kernels, the 
parallel code produced by restructuring 
compilers is comparable to that generated 
manually. The grief much current 
research is a similar achievement for large 
applications. 

Wbrk st^portpd by DOE, 



Computational Science Environments 

Much remains to be accomplished in order to 
create effective work environments for compu- 
tational scientists. The issues range from 
programming tools to facilitate error-free code 
generation, to performance analysis and 
enhancement, from pre- and postprocessing of 
data, to effective visualization of multidimen- 
sional time-dependent phenomena, and from 
communication of scientific results to the 
creation of problem-solving environments 
oriented toward particular classes of applica- 
tions and using the specialized idioms and 
language of those applications areas. The 
objective is to make HPCC and the computa- 
tional science approach to problem solving 
more useful for and readily available to large 
numbers of engineers and scientists. This is 
particularly relevant to the transfer of computa- 
tional science technology to industry. 

Tools that merit early consideration include 
common, multimedia user interfaces, graphic 
job monitoring and management aids, parallel 
Fortran and C extensions, parallel multinode 
debuggers, performance analyzers, shared data 
cache systems, transport, hierachical file stor- 
age systems checkpoint/restart software, and 
automatic memory coherency managers. 
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os Alamos National Laboratory has 
developed MediaView, a generic framework 
for communication via multimedia docu- 
ments. These documents can include not only 
text, line art, and still images, but sound, 
video sequences, and computer-produced 
animations. Also, when cast in digital form, 
the mathematical content of a document can 
fee symbolically and numerically manipu- 
lated. Thus, one can experiment with the 
mathematics, derive new results, and simulate 
different situations with different parameter. 
Finally, an animated scientific visualization 
can be incorporated in a MediaView docu- 
ment and be examined in situ or electroni- 
cally mailed to a colleague for independent 
study. Thus, MediaView is a communication 
tool tha f offers new and dramatically different 
ways of interacting with others. 

Media ViVv is easy to use and understand. 
It is based on a *s* r d processor metaphor, 
something familiar to any computer user, in 



addition to text, that metaphor is extended to 
include several multimedia components. And like 
text, these additional components are subject to 
the same cut, copy, and paste paradigm, making 
them as simple to manipulate as words. As a 
result, powerful and complex MediaView 
documents can be constructed by nonspeciaiists. 

MediaView can be of enormous benefit to 
workers in a distributed but networked HPC 
"center" One of many possible examples of use 
is exchanging scientific visualizations and 
mathematical analyses. Unlike a conventional 
document, these components appearing in 
MediaView are live ami interactive. 

The figure is an interactive animator for a 
sequence of images that were produced on a 
supercomputer. The colors represent the density 
cf the fluid calculation being simulated here. 
The color bar, from cyan to yellow, indicates less 
dense to more dense. Different colors delineate 
structures that are formed in the fluid simulation. 
The observer is viewing a cut-away of a Mach-6 
intergalactic jet simulation, with the green lines 
showing the boundary of the data, had a "slice" 
not been cut-awav. 
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esearch ami development at Ode 
Ridge National Laboratory in the area cf 
performance characterization for parallel 
supercomputers has produced two software 
tools for monitoring ami visualizing the 
behavior of parallel algorithms. Portable 
Instrumented Communication Library 
{PICL) is a subroutine library that imple- 
ments a generic message-passing interface 
on a variety of multiprocessors. Programs 
that use PICL routines for interprocessor 
communication are portable in the sense 
that the same source cade can be compiled 
and executed on many different parallel 
architectures Optionally, PICL also 
provides time-stamped trace data on 
interprocessor communication, processor 
busy/idle times and user-defined events. 
These data serve as input to a second tool, 
called ParaGraph, which is a graphical 
display system for visualizing the behavior 
cf parallel algorithms on message-passing 



multiprocessor architectures. ParaGraph 
uses both color and motion to provide a 
dynamic visual depiction of the behavior of 
the parallel program. ParaGraph provides 
ieverat distinct visual perspectives from 
which to view the same performance data, 
in an attempt to gain insights that might be 
missed by any single view. 

Three of the approximately twenty views 
in ParaGraph are shown in the accompany- 
ing figure, The circular image (two shades 
of red) in the upper-right portion of the 
diagram shows how efficiently the program 
is naming on the processors ami the load 
balance between processors. The larger, 
lighter-shaded polygon indicates the 
maximum efficiency of the program thus far 
for each processor, whereas the smaller, 
darker ploygon shows the current efficiency 
and load balance. The larger the polygon, 
the more efficient the algorithm, and the 
more circular, the better the load is 
balanced. Ideally, one would want the 
darker polygon to always be large and 
circular and about the same size as the 
lighter polygon. 
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i the Georgia 
Institute of Technology, a 
research group led by 
physicist Uzi Landman has 
been a major user of the 
Department's supercom- 
puter facilities /or a number 
of years. Landman's 
specialty is the computer- 
ized simulation of the 
atomic uwW, particularly 
as it is reflected in the 
properties of materials and 
in the physical ami techno- 
logical consequences of 
their microscopic interac- 
tions. By incorporating 
both classical ami quantum 
mechanical principles into 
his supercomputer pro- 
grams. Landman has been 
able to accurately predict 
how various materials 
behave— at the molecular 
and atomic level — under 
various physical conditions 
and to explain how they 
respond when brought into 
contact with each other 
under pressure. Visualiza- 
tion of these simulations is 
providing scientists with 
fundamental new knowl- 
edge of the atomic mecha- 
nisms that underlie the 
behavior of materials. 
Landman, shown in the 
photo, illustrates on a 
monitor how an electrically 
unbalanced cluster of salt 
molecules (Na u ~ Ct fU ) 
becomes stabilized. To 
make up for the missing 
chlorine atom, a free 
electron (indicated by the 
cloud of white dots) 
attaches itself to a sodium 
atom (shown in red) on the 
surface of the cluster and 
begins to pull it away, 
leaving a balanced cluster 
of 13 sodium and IS 

chlorine atoms. 
Wort supported fay DOE 
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Distributed Computing 

Computational science is a diverse and geo- 
graphically distributed activity- As previously 
mentioned and as discussed further below, one 
of the goals of the HPCC Program is to 
establish HPC Research Centers that will be 
used by many researchers via high-speed 
networks. Many researchers will also have 
significant local resources in the form of 
powerful workstations and local shared fast 
microprocessor systems. Effective utilization of 
these distributed DOE resource* in attacking 
grand challenge problems will require software 
to support distributed systems. Distributed 
systems are systems in which two or more 
computers cooperate on the solution of the 
problem. For example, one may want to 
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distribute computations in order to match 
algorithms for specific subproblems with the 
most appropriate computer architecture. The 
goal of distributed systems research is to create 
a uniform user and programming environment 
across heterogeneous systems: in other words, 
to make it as easy to use a collection of 
computers on a given problem as it is to use a 
single computer. Important areas of research 
and development include distributed program- 
ming environments to make it easy to create, 
debug, and tun distributed scientific applica- 
tions; distributed operating system techniques 
for control of distributed resources; distributed 
file system management and archival support 
tools to facilitate utilization of high perform- 
ance networks; and mechanisms for effective 
distributed software control and update. 

Visualization and Imaging 

The purpose of HPCC technology is 10 aug- 
ment the work of scientists and engineers. 
Their interlace to these systems is of central 
importance if they are going to be most 
effective. Therefore, it is important to provide 
the hardware and software technology to allow 
for input of a variety of image data and for 
output that supports the use of their senses, 
particularly visual. u,r gaining maximum 
problem insight. 

Research and development is needed in 
input, storage, and analysts of image data and 
in the output of data in appropriate visual and 
other sensory forms. The output techniques 
needed include effective management of three 



upercomputers do not operate as 
stand-alone devices. Typically they are 
embedded in a heterogeneous shared 
resource environment consisting of many 
services, for example: mass storage, input! 
output, authentication, graphics engines, 
terminals, and workstations, During the 
1970s Lawrence Uvermore National 
Laboratory (LLNL) played a leading role in 
developing the shared resource server model 
of a supercomputer environment. During the 
1980s LLNL developed a distributed 
operating system architecture for a 
supercomputer environment. The goal of this 
research and development was to create the 



programming and user-at-a-terminal logical 
model of a single shared and nonshared 
memory multiprocessor system providing 
transparent resource access. The architecture 
was implemented as the native client-server, 
multiprocessing operating system on CRAY X- 
MP and Y-MP systems that has been in 
production for over two years- Several nenvork 
services were implemented within the architec- 
ture. For example, the storage part of the 
architecture was implemented on UNIX systems 
as a distributable hierarchical mass storage 
system. LLNL is collaborating with General 
Atomics to make this storage system available 
to industry and other supercomputer centers. 
The architecture also provided the basis for the 
IEEE Distributed Mass Storage Reference 
Model 

Worfc supported DOE. 
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dimensional dala and display, browsing, 
connection of output images to quantum* ve 
data, and comparison ot current and past or 
alternative approaches. Visualization and 
imaging are dependent on high performance 
networks, ef fective data-compression tech- 
niques, scientific databases, and high-definition 
recording and display technology. 

Veiy Largs Scientific Databases 

Many ot the grand challenge applications of 
interest to the DOE— such as global climate 
modeling, understanding the structure of 
matter, and human genome — potentially 
involve very large scientific databases contain- 
ing hundreds or perhaps thousands of terabytes 
of data. These databases will require hierar- 
chies of storage media, which in turn will 
require new approaches to database structuring, 
storage, and searching. In addition, improved 
interfaces are needed to facilitate scientists' 
querying, browsing, studying, and transform 3 ne« 
the databases to enable better understanding of 
the physical phenomena portrayed in them. 
Standards need to be developed in this area to 
aid collaboration among scientists and the 
aggregation of logically single databases from 
distributed databases being generated at many 
experimental locations. 

COMPUTATIONAL TECHNIQUES 

The principal objective in this research area 
w ill be to improve the performance of numeri- 
cal simulations on parallel computing systems. 
Research in this area should not only encom- 
pass algorithm kernels and subroutines, but 
should also include full-scale scientific and 
engineering applications of 'ontemporary 
interest. 

Computational Methods and Algorithms 

DOE w ill support a comprehensive research 
program in advanced computational techniques. 
The principal goal of this effort will be to 
prov ide methodologies and algorithms to 
enable the effective solution ot grand challenge 
problems and to promote U.S. industrial 
competitiveness. The focus will be on methods 
{hat allow exploitation of the protypical 
architectures deployed at the UPC Research 
Centers. Creation of the high performance 
algorithms, to be incorporated in the methodol- 
ogy, will require, in many instances, new 
numerical techniques that exploit greater 
parallelism than can be simply extracted from 
parallelized versions of existing sequential 
algorithms. 



Applications Algorithm Design Systems 

In addition to developing new methods and 
algorithms, it is important to package these 
techniques and make them easily usable. An 
attractive area for research is how to create 
parallel design methodologies for large-scale 
scientific applications that include modular and 
well-integrated software tools, user interfaces, 
standards and protocols. Such design method- 
ologies will expose the similarities and impor- 
tant distinctions between the hardware architec- 
tures and provide mechanisms to promote 
portability, New fundamental mathematical 
algorithms will also be posed in such frame- 
works. 

HIGH PERFORMANCE COMPUTING 
RESEARCH CENTERS 

DOE will contribute to the support of HFC Re- 
search Centers. The main purpose ot these cen- 
icrs w ill be to enable computational scientists 
to explore the effectiveness of new full-scale 
high performance architectures in solving grand 
challenge pn>blems. By using full-scale archi- 
tectures, code developers can carry out experi- 
mental code design, with the expectation of ad- 
equate compute power to solve the problem. In 
order for researchers at remote locations to 
make effective use of these facilities, each cen- 
ter will become a major node on a National Re- 
search and Education Network backbone. Col- 
laborative consortia of DOE-*upported HPC 
Research Centers, industry (both user industries 
and computing and communications indus- 
tries), and universities w ill be very highly en- 
couraged. 

The sponsorship of the HPC Research Cen- 
ters is an area in which DOE has a long history 
of excellence and leadership. DOE national 
laboratories have recognized the importance of 
computational experiments and analysis to 
complement expensive physical experiments. 
Many of the DOE national laboratories con- 
tinue to operate major production-onenied su- 
percomputing centers, as well as a number of 
parallel computing research centers. There are 
many types of parallel computer architectures 
that will need evaluation. Because several al- 
ready have displayed high potential at outper- 
forming contemporary vector multiprocessors, 
a is reasonable to expect that there w ill be addi- 
tional such architectures identified in the High 
Performance Computing Systems component 
of the HPCC Program. Among the concepts to 
be thoroughly evaluated are SIMD. MJMD. and 
hybrid processors and shared, distributed, and 
hierarchical memories. Fach hardware architec- 



he development of 
a comprehensive and 
general information system 
for molecular biologists — 
the Chromosome Informa- 
tion System (CIS)— is a 
major part of the computing 
effort at the Lawrence 
Berkeley Laboratory iLBL) 
Human Genome Center. In 
this system, one consistent 
graphical user interface 
will transparently interact 
with multiple underlying 
databases. The separation 
of user actions on the 
biological data objec ts and 
the data storage enhances 
the system flexibility ami 
functionality. The descrip- 
tion of the data, and the 
operations on the data, are 
specified in a high-level 
language that captures the 
biological concepts while 
permitting the use of 
commercial data manage- 
ment systems at the lower 
levels, Through integration 
of specialized databases 
(e.g., an image database, a 
sequence da f abase. and a 
map database} bioloqical 
information may he 
accessed at many levels of 
abstraction. 

This work is based on 
concepts developed by the 
Data Management research 
group at LBL, The CIS has 
been implemented by a 
c ollaboration of the Data 
Management group and 
biologists from the LBL 
Human Genome Center. 

Wort supported by DOE. 
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jflcient modeling of three-dimen- 
sional, time-dependent fluid flaws plays a 
critical rale in many grand challenge research 
problems. Adequate resolution of these flows 
using conventional spatial discretizations 
would require computational effort far in 
excess of the most optimistic projections for 
war-term growth in computational capabili- 
ties. Scientists at Lawrence Livermore 
National Laboratory, the Courant Institute, 
and Los Alamos National Laboratory are 
developing adaptive methods that focus 
computational effort where it is most needed. 



resulting in a dramatic increase in the 
computational efficiency and enabling the 
solution of realistic 3D flow problems. 

One type of adaptive technique is local 
adaptive mesh refinement, in which the 
computational mesh dynamically changes as 
a function of space, time, and data to 
maintain a fixed level of accuracy in the 
calculation. The figure shows computational 
results obtained using this method to model 
the collapse of an ellipsoidal cloud of Freon 
when it is hit by a shock wave, where yellow 
indicates the location and concentration of 
the Freon, The use of adaptive refinement for 
this problem reduced computational costs by 
more than an order of magnitude, resulting in 
a savings of hundreds of Cray hours. 

Worfc agpwted by DOE, DARPA, NSF. AFOSfl. and DNA. 



ture ty pe will be evaluated with respect to com- 
peting software architectures for ihe various 
grand challenge applications. 

The activities at each center will he designed 
to support interdisciplinary and intcMnsutu- 
tionai collaborations. A key ingredient will be a 
critical mass of in-house research conducted by 
focused teams of applications scientists, com- 
putational mathematicians, and computer scien- 
tists. These collaborative teams will direct their 
efforts toward the solution of grand challenge 
problems of interest io DOE. They will be re- 
sponsible for maintaining a dialogue with uni- 
versities, industry, and other laboratories and 
centers in order to maximize the dissemination 
of information and avoid unnecessary duplica- 
tion of effort. 

The HPC Research Centers will also play an 
important role in computational science 
education. This aspect of their mission is 
expected to go well beyond the usual training 
courses in computing and to embrace such 
issues as computational science curriculum 
design at the university level and introductory 
and motivational programs and material for 
high school and elementary students and their 
teachers. Clearly, accomplishing these activities 
will require strong collaboration with the 
education community. 
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uring FY 1985. the U.S. Congress 
mandated a study of computer 
networking requirements of the U.S. 
research community. The preparation 
of such a far reaching study, together 
with the growing widespread remote access to 
supercomputers through federal supercomputer 
programs, such as those in the DOE and NSF. 
focused the attention of the U.S. research 
community on the need more capable, high- 
speed computer networks. 

Within the DOE, computer networks had al- 
ready been used extensively for specific appli- 
cations and programs, primarily supporting Fu- 
sion and High Energy Physics, but these net- 
works were mostly incompatible and lacking in 
capacity. Because of this and because of a rec- 
ognized and significant increase in networking 
requirements, the Energy Research <ER) com- 
munity endorsed a proposal to create the En- 
ergy Sciences Network (ESNet). The ESNet 
has been developed to be compatible with ex- 
isting network requirements while providing 
connectivity and interoperability to other fed- 
era! research networks in addition to a nondis- 
ruptive transition path to emerging international 
network standards. 

The ESNet is the vehicle through which the 
ER community has become a full partner in the 
Internet community of computer networks and 
through which the ER community will become 
an integral part of the proposed National Re- 
search and Education Network (NREN). In 
fact, the ESNet currently incorporates access by 
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SNet is currently a Tl -based (1 5 
Mbitis} data communications network 
supporting more than twenty major Office of 
Energy Research (OER) sites directly 
connected to the backbone, it is a multiproto- 
col network supporting two major families of 
data communications pmotcols, DECnet 
and the Internet protocol. 

The five major OER programs supported 
an Basic Energy Sciences, Health and 
Environmental Research, High Energy and 
Nuclear Physics, Fusion Energy, and the 
Superconducting SuperCollider. 



ESNet is engineered, installed, and 
operated by the networking staff of the 
National Energy Research Supercomputer 
Center (NERSC), heated at Lawrence 
Livermore National Laboratory. Initial 
deployment of the Tl circuits began in late 
1999 and became fidly operational with both 
protocol families by early 1990. Since 
becoming operational, total traffic on the 
network has doubled about every six months . 

In addition to interconnecting the major 
OER-supported sites (shorn on Out map) 
ESNet provides access to several regional 
networks, has international connections to 
Japan and Europe , and currently connects to 
two Federal Inter-agency exchange (FIX) 
points providing interconnects to other 
national backbone networks, including the 
NASA Science Network and the National 
Science Foundation Network (NSFnet). 




many ER collaborators through the NSFnet and 
through regional networks as was proposed in 
the ESNet Program Plan. June 1987. 

These networks, along with ESNet and oth- 
ers, are generally considered to be the precursor 
to the NREN. 

The ESNet has been proposed by the DOE 
within the framework of the HPCC Program, of 
which NREN is a major component. The 
HPCC Program is a balanced computing re- 
search proposal that includes research in HPCC 
systems, software technology, and computer 
networks as well as funding for human re- 
sources and the NREN. The NREN, as pro- 
posed, will be a computer communications net- 




work that interconnects educational institu- 
tions; national laboratories: non-profit institu- 
tions: government facilities; commercial or- 
ganizations engaged in government supported 
research; and unique national scientific and 
scholarly resources such as supercomputer cen- 
ters, libraries, etc. 

The NREN will provide high-speed com- 
munications access to over 1300 institutions 
across the U.S. within the initial planning pe- 
riod, and is proposed to offer sufficient capac- 
ity, performance, and functionality $o that the 
physical distance between institutions is no 
longer a barrier to effective collaboration. The 
NREN will suppo$$ess to HPCC facilities 
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he ESNet was 
developed after careful 
definition and documenta- 
tion of user requirements 
from all of the Energy 
Research program areas. 
This user requirements 
orientation led to an ESNet 
implementation that 
departed from other 
internet implementations. 
Most notably, the ESNet 
was developed to be a 
multiprotocol network to 
preserve Energy Research 
user applications invest* 
ments in existing network 
implementations while 
allowing full interoperabil- 
ity with other internet 
implementations, such as 
the NASA Science Internet 
and the NSFnet. Recently, 
the internet Advisory 
Board has adopted the 
concept of incorporating a 
multiprotocol approach to 
allow a nondisruptive 
transition to international 
standards. 

Wert supported by DOE. 



and services such us full-motion video, rapid 
minster of high-resolution images, real -lime 
display of time-dependent graphics, remote op* 
eration of experiment and advanced informa- 
tion sharing and exchange. The NREN is also 
intended lo incorporate advanced security and a 
uniform network interlace to domestic users as 
well as a standard interface to international re- 
search networks. 

The NREN proposes to achieve economics 
of scale by serving many federal agencies, in- 
dustrial R&D centers, and university campuses. 
Although the federal government will provide a 
substantial direct investment in the NREN, it is 
important to recognize that a large indirect in- 
vestment will be made by academic and indus- 
trial institutions and other networks that will 
connect to the NREN. The DOE NREN budget 
is for the dia*ct support of DOE laboratory and 
other DOE-lunded research facility connections 
to the NREN. per the existing federal budget 
process. This DOE program will complement 
other NREN deployment, funded and coordi- 
nated by the NSF. to the hmader research and 
education community. 

The ESNet is user requirement driven w ithin 
the framework of the DOE mission. The DOE 
HPCC Program proposal for the NREN is ap- 
plications and user oriented, both with regard to 
the operations and the gigabit research areas. 
The DOE will participate in the NREN manage- 
ment to ensure a requirements-driven approach 
with strong user involvement and site coordina- 
tion. Because of the common mission orienta- 
tion and common requirements of the ER com- 
munity* the consensus of this community is that 
the ESNet would continue to be viewed as a 
large, logical community of interest network 
within the NREN context, even if in the longer 
term ESNet were no longer a separate hut com- 
patible physical network. 

Several of the most demanding functional 
requirements tor DOE networking capabilities 
are highlighted below. 

The ER community will have a continuing 
need for multiprotocol support throughout the 
initial planning period. Although there is a con- 
stant transition to UNIX-like systems and TCP/ 
IP network protocol usage, a wholesale conver- 
sion to these systems is impractical and expen- 
sive and cannot be accomplished within three 
years. During that time, OS! standard protocols 
will have been incorporated into the ESNet. 
where they will co-exist with TCP/IP. X.25, 
and DECnet. 

Videoconferencing has been initiated within 
the ESNet in a test mode. In recent years, the 
cost of video technology, the use of compres- 



sion techniques for video transmission, and the 
use of packet ized video have spurred great 
interest in using video for research collabora- 
tion support. The ESNet video pilot project has 
received very positive evaluations from all 
involved. 

The benefits of video are even more ev ident 
as one examines the logistics of the many 
international collaborations in w hich ER 
programs are involved. Video enables these 
collaborators and negotiators to meet on a 
weekly and even daily basts, which could not 
be done otherwise. 

The use of windowing technologies and 
other software tools has enabled more facile 
and productive use of remote computing and 
control facilities, rhese tools 'extend' the 
capability of a principal investigator s local 
workstation to these remote facilities in such a 
fashion that they can be u>ed more readily in 
the investigation of complex problems. The 
operation of these tools in a distributed 
environment* however, is a very taxing network 
requirement. 

In addition to windowing tool*, there is a 
growing requirement tor distributed program- 
ming and debugging tools. Remote procedure 
call functionality has recently been imple- 
mented on the supercomputer systems at the 
National Energy Research Supercomputer 
Center, and as this functionality achieves more 
w idespread use. it is anticipated that the 
distributed processes will place a heavy 
demand on ESNet bandwidth*, it * ill be very 
important to have the ability to remotely 
analyze system performance with regard to 
parallcli/ation* individual system node pro- 
cesses, and throughput, etc.. especially, when 
evaluating prototype HM)-gigaflops and 
teratlops systems as part of the HPCC Program. 
This will place even more demands on network 
bandwidth (possibly another order of magni- 
tude) in the FY timeframe. 

Proposals for the next large experiments 
withirr the Fusion Program lor the International 
Thermonuclear Experimental Reactor and 
within the High-Energy Physics Program lor 
the Superconducting Super Collider include the 
requirement for distributed control, i.e.. remote 
operation, of the experiment. Presently, such 
requirements for experiment control require the 
most capable local area network implementa- 
tions. These requirements for the wide area 
ESNet. in the FY timeframe, will 

require bandwidths in the gigabit range. 
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ME NETWORKING RESEARCH PLAN 

The DOE HPCC networking plan must 
encompass at least the following technical 
research issues to meet anticipated near ierm 
needs for using ESNet and NREN as they 
evolve. 

Performance 

Virtually ail the characterization to date of the 
technical parameters of NREN has centered 
around bandwidth. However, additional factors 
such as response time and availability can be 
equally, or even more, important for many 
applications and must be given corresponding 
consideration. In addition* support will be 
provided for research in network performance 
evaluation. 

Policy-Based Routing 

Policy-baned routing is the term used to denote 
the routing of traffic* taking into account the 
usage policies of both the using entity ami the 
service provider. This issue is still in an early 
research phase. Topics of study include 
determining mechanisms and parameters to 
allow independent domains to protect them- 
selves from unwanted traffic and mechanisms 
to determine paths through the system meeting 
client needs. 

Open Systems Interconnect (OSI) 
Integration 

The OSI networking standards need to be 
incorporated into the NREN architecture and 
capabilities. 

Multiprotocol Support 

A variety of protocol suites are currently used 
by various user communities. For DOE, these 
include DOD IP. DECnet {Phase 4), SNA. 
X.25. and others. User-level requirements 
based on these protocols will continue for some 
time. The network must allow for support of 
these protocols if all or most users are to be 
supported in their research. Design of gateway 
and transition mechanisms between protocol 
suites will require research and development. 
New higher performance protocols will also be 
investigated. 

Network Management and Operations 

One of the most important factors in providing 
gigibit network service across a large commu- 
nity will be the ability to establish a hierarchi- 
cal network management and operations 
enterprise that will be able to effectively deal 
with network management issues in a very 
diverse environment. 
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Problem Resolution Toote 

User level problems in networking are becom- 
ing increasingly sophisticated and subtle, while 
at the same time the complexity of the network 
environment is rapidly increasing. Resolution 
of these problems will require a sophisticated 
arsenal of tools that will help to confirm, 
analyze, ami pinpoint problems at ail levels of 
network services. 

PRIVACY AND SECURITY 

Protecting the NREN and its attached resources 
against attack is an important requirement. 
Protection and monitoring mechanisms must be 
pan of its design. 

The above is a partial list of considerations that 
must be addressed by the NREN. in addition to 
the major charter of providing state-of-the-art 
performance to the HPCC Program. An 
interagency NREN will evolve over time, 
adding these capabilities as they can be reliably 
provided. It is unlikely that these divergent 
requirements can be met with other than a 
hierarchical architecture, incorporating multiple 
networking entities and multiple levels of 
networking components, many of which may 
be in place already. 

GIGABITS RESEARCH AND DEVELOPMENT 

The leadership for gigabit network research for 
the Federal HPCC Program will be performed 
by DARPA. DOE will support a modest effort 
in gigabit network research: to support grand 
challenge applications, to provide connectivity 
within and among its HPC Centers, to contrib- 
ute the high performance networking expertise 
of the DOE laboratories to the HPCC Program 
generally, and to ensure that it maintains the 
expertise to upgrade and integrate its ESNet 
with the emerging gigabit NREN, The gigabit 
network research program below meets these 
needs. The previous section outlined key 
research issues needed for the interagency 
interim NREN effort: they apply equally to the 
full NREN effort. Below are listed additional 
gigabit research and development efforts thai 
DOE will pursue. 

Local Gigabit Networks 

For a national gigabit backbone to he fully 
effective, local gigabit networks are required to 
interconnect the local resources of the HPC 
Centers. Two main approaches to local gigabit 
interconnection currently exist: the circuit 
switch approach and the bus approach. For 
each, there are many unanswered questions 

3S 
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he CASA wide-area lestbed is one of 
five very high speed communication network 
projects lead by the Corporation for National 
Research Initiatives (CNRIK CASA is truly an 
interagency collaboration involving the 
California Institute of Technology, DOE'S Los 
Alamos National Laboratory, NASA's Jet 
Propulsion Laboratory, and NSFs San Diego 
Supercomputer Center. MCI. Pacific Bell, and 
t/.S, West are also collaborating with the 
CASA testbed. 

Primary research goal of this collaboration 
is the effective simulation of grand challenge 
probiems on geographically dispersed super- 
computing resources connected via very high 
speed networks. The communications environ- 



ment envisioned for CASA relies on the Los 
Alamos-proposed High Performance Parallel 
Interface (HIPPI) A/VS/ standard and wilt 
leverage research at Los Alamos on HIPPI- 
based crossbar switches, networking sqftnwre, 
and protocol processors. The CASA project will 
develop several prototype distributed super- 
computing applications in chemistry, geophys- 
ics, and global climate modeling. The present 
supercomputing resources available in the 
collaboration are Thinking Machines Corpora- 
tion's CM-2, Cray Research Incorporated* s Y- 
MPs, and several hypercube architectures. As 
with the other testbeds, the CASA effort will 
contribute to the research ami development 
base required for the National Research and 
Education Network. 




Wort supported by NSF and DARPA. 



about the best design. The circuit switch 
approach looks particularly attractive because 
nl its excellent bandwidth scaling, security, and 
connection setup overlap characteristic*. DOE. 
using existing gigabit circuit switch expertise 
and equipment within its national laboratories, 
w jJI investigate circuit switch architectural 
questions, such as how is the number of 
connections economically scaled, w here is the 
optimum placement of functionality, and what 
is the effect of various host and network 
interlace i/O architectures on actual throughput. 



Gigabit Applications 

Prototype local and long-haul gigabit network 
applications need to be developed thai demon- 
strate the utility of these networks tor enhanc- 
ing the •scientists 1 productivity, achieving 
computational power through distributed 
resource sharing, or enabling other- possibly 
unforeseen, significant scientific or economic 
advantages for the grand challenge applica- 
tions. Developing distributed applications, 
using resources of several HPC C enters 
cooperatively on a * ingle problem, may require 
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he Multiple Crossbar Network 
(MCN) iesibed is a gigabit per second 
testbed network at Los Alamos National 
Laboratory that interconnects computing 
resources in the Advanced Computing 
Laboratory, Central Computing Facility, ami 
Laboratory Data ami Communications 
Center. It will be extended in approximately 
a year to include the wide-area CAS A gigabit 
per second testbed. CASA sites include 
Calttch, the Jet Propulsion Laboratory and 
the San Diego Supercomputer Center. 

The purpose of the MCN testbed is to 
experiment with gigabit per second network- 
ing technology, to understand how to apply 
this technology to leading-edge scientific 
problems, and to promote closer interaction 
with industry. The local distribution system 
for the MCN testbed is based on work done 
in the Laboratory's Network Engineering 
Group on the High-Performance Parallel 
Interface (HIPPl) and HIPPl based switch- 
ing systems. A Network Systems Corporation 
crossbar switch with a 6 A gigabit per second 



aggregate throughput and less than 1 
microsecond switching time is being coupled 
to Los Alamos-designed intelligent network 
interfaces to assemble a high-petformance 
HIPPl switching system called CP* Several 
CP* systems will be interconnected with 
fiber-optic links to assemble a multiple 
crossbar network. 

This network will interconnet t supercom- 
puters such as CRAY X MPs and Y-MPs, 
Thinking Machine CM -2, workstations from 
Sun ami IBM, framebuffers. and high- 
performance disk systems. This extensive 
collection of systems with HIPPl interfaces 
provides a unique facility to do performance 
and interoperability testing of their HIPPl 
implementations. 

The applications for this testbed include 
visualization of high-resolution (1024 x 1024 
x 24) images at video rates (24 frames/ 
second), distributed supercomputing using 
software tools such as Express and ISIS, and 
data motion. This testbed will provide 
facilities to explore the applications and 
networking requirements for high-perform- 
ance computing systems of the 1990s. 

Mtorit supported by DOE. 
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o improve network 
congestion control, the 
Network Research group at 
Lawrence Berkeley 
Laboratory developed 
algorithms for slow-start, 
dynamic window allocation 
and improved round-trip 
estimators. These improve* 
mems enhanced perform- 
ance over the Internet 
(loaded) by factors of 2 to 
IOO and the technology 
was adapted by virtually 
all major computer 
manufacturers. 

The group also devel- 
oped a header compression 
algorithm to enable 
practical implementations 
of serial line IP protocol, 
which permits computers to 
£vt as network hosts over 
modems, thereby providing 
simpler, more reliable, and 
more secure data connec- 
tions. This technology, too. 
has been transferred to 
industry. 

Other research includes 
development of a TCP 
header prediction algo- 
rithm to increase through- 
put on highspeed net- 
works, investigation of 
policy-based routing, and 
development of a new 
algorithm that allows 
network addresses to have 
arbitrarily complex 
structure yet permits 
routing look-ups to be 
completed in constant time. 
This algorithm— the only 
one known to handle ISO 
addresses efficiently— will 
be included in 4. A BSD 
UNIX. 

WorksuppoiMbyOCSL 
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new mathematical algorithms, new application 
structuring, and new interprocess communica- 
tion techniques and protocols. 

Protocols and System Software 

With high performance computers and gigabit 
local area networks in place, research wili be 
performed to measure communication perform- 
ance and to identify bottlenecks in applications, 
operating systems, and communication protocol 
architectures and implementations. These 
studies wili lead to experiments with new 
approaches to operating system and protocol 
design and implementation. Extensions of 
parallel computing tools to enable applications 
programs to run on multiple hosts linked with 
gigabit wide-area networks will be investi- 
gated. 

Gigabit Testbeds 

Because it is important to experiment with 
alternative gigabit networking approaches, 
local gigabit testbeds will be set up associated 
with the HPC Centers. DARPA and NSF have 
already initiated a significant Gigabit Testbed 
Research Program. DOE-supported sites 
participate in these national or regional area 
testbeds. DOE will provide additional testbed 
resources between sites for networking 
application experimentation that does not 
interfere with production networks. 



BASIC RESEARCH AND HUMAN RESOURCES 

This component of DOE\ HPCC Pro- 
gram will build on the existing DOE ap* 
plied mathematical sciences base pro- 
gram and recognizes that a portion of 
the HPCC funding must go toward long-range 
investment in the future. While the other three 
components are designed for a near- or inter- 
mediate-term impact, basic research funding 
must increase for the long-term health of the 
field. Likewise, to develop the human resource 
potential of the future, investment in education 
must accelerate, beginning at the high school 
level or below and continuing through life-long 
training. In both basic research and education. 
DOE has a long history of contributions and 
experience from which to draw. Our plan for 
the HPCC Program uses this base and builds 
upon it. 

BASIC RESEARCH 

Basic research supports all other aspects of the 
HPCC Program. The topics below address 
fundamental aspects of computing and its 
applications. 

New Algorithms 

Part of the vitality of HPCC depends on a 
steady influx of new algorithms and mathe- 
matical models, indeed, the current generation 
of practical, reliable algorithms, which are the 
tools for attacking the grand challenges were 
once exotic new techniques designed and tested 
on model problems. DOE has long provided a 
natural and effective platform for research into 
new algorithms. Techniques that have grown 
out of DOE's basic research program now 
serve as general tools for hundreds of different 
applications. The continued renewal of creative 
algorithms and mathematical models is one of 
DOE's primary strengths, and the HPCC 
Program will exploit this strength. 

Parallel Numerical Algorithms 

Much of HPCC is scientific numerical model- 
ing. As the sophistication of the physics models 
increases, the complexity of numerical prob- 
lems expands. Parallel computers magnify the 
issue. We need continuous development of the 
best parallel numerical methods to ensure that 
we can truly apply the power of our most 
advanced computers to model the needed 
physics correctly. 
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Mttcfting Algorithms to Architectures 

Research on HPCC has produced a diverse set 
of multiprocessor designs. Each design seems 
to favor certain types of algorithms and perform 
poorly on others. The relationship between 
good matches is often - A obvious at the outset. 
We need to have a better understanding of how 
to best select and use a particular machine for 
solving a spectrum of different computational 
problems. 

Performance Measurement and Analysis 

When a parallel program runs poorly on a 
specific machine, the cause of the slowness can 
be baffling. It could be the algorithm, the 
translator, the partitioning of work, communica 
tion bottlenecks, system overheads, hardware 
failures, or any number of other causes. 
Moreover, many current analysis techniques 
alter the performance as they try to measure it. 
Mechanisms for accurate and timely identifica- 
tion of the cause of poor performance must be 
improved. Successful use of large numbers of 
processors will depend on progress here. 

Novel Applications 

Many important applications of HPCC remain 
to be discovered and explored. Those who best 
understand the potential of HPCC must ensure 
that a process for continually exploring new 
applications is developed and implemented, 
When feasibility has been demonstrated and 
documented for a novel application, responsi- 
bility for its continued exploitation will transfer 
to the affected applications discipline. 

Theory ol Parallel Computing Complexity 

Traditional complexity work provides a 
foundation for identifying efficient sequential 
algorithms, but it does not directly apply to 
parallel machines. Parallel complexity models 
often make unrealistic simplifications, such as 
an infinite number of processors or an infinite 
supply of memory, an assumption that leads to 
many impractical algorithms. We need a solid 
foundation for parallel computing algorithms 
that will accurately represent their performance 
on large, but finite, numbers of processors and 
various types of parallel computers. 

Software Productivity and Reliability 

The cost of software for conventional comput- 
ers remains very high. The soitware cost for 
parallel systems will be substantially greater. 
Likewise, software reliability is often question- 
able. We need to develop a practical methodol- 
ogy for building high-quality software that can 



ii recent years, DOE researchers have 
developed and analyzed advanced numeri- 
cal methods far salving the large-scale 
nonlinear differential and algebraic 
systems that arise during the numerical 
solution of ordinary and partial differential 
equations. This research has applications 
in a diverse range cf problem areas, 
including circuit simulation, computational 
electromagnetics, computer-aided design of 
mechanical systems, modeling cf chemi- 
cally reacting flaw, power systems, chemi- 
cal vapor deposition and optimal control. 
The development of effective solution 
techniques has required a new understand- 
ing of the mathematical strvcture cf 
differential equation systems, as well as a 
rethinking of algorithms. For extremely 
large systems, new techniques far large* 
scale linear and nonlinear systems have 
been developed. 

New algorithms and methodologies have 
been transferred to industry in part through 
the development of high-quality mathemati- 
cal software. Scftware developed by DOE 
for the solution cf these problems now has 
hundreds of users in industry, government 
and academia; variants cf this software 
have been incorporated into several 
commercial codes. The analysis and 
software have directly impacted solution 
techniques for the dynamic analysis cf 
mechanical systems, trajectory control, and 
for a wide variety cf chemical engineering 
applications. Second-generation algorithms 
and software are designed to attack 
extremely large systems. Effectiveness of 
these techniques was recently demonstrated 
on a 2D laser oscillator model at Lawrence 
Uvermore National Laboratory with 38,000 
unknowns. Work is now under way to 
exploit the excellent potential for paralleli- 
2ation cf these techniques. 

Worfc supposed by DOE, ARO, AFOSR NASF, 
Aerospace Corp., and Exxon. 
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erformance variations in modern super- 
computers can be visualized by plotting the 
megaflops delivered on the Perfect Bench- 
marks, Each dot in the figure represents that 
megaflops rate of ike indicated machine on one 
of the Perfect Okies. The gaps between theoreti- 
cal peak rates and actual rates delivered on 
real applications demonstrate the need for 
comprehensive performance studies of die 
relationships between supercomputer architec- 
tures and applications and systems software. 

Wok suppMtf by DOE, NSF. and NASA. 
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*>sf computer benchmarks are 
simple, fixed-size problems that are timed for 
various machines. Ames Laboratory research- 
ers have recently completed the design of the 
first computer benchmark based on fixed-time 
comparison principles. Coiled SLALOM, for 
Scalable. Language-independent, Ames 
Laboratory One-minute Measurement, the 
benchmark adjusts the number of unknown 
variables in a complete radiation transfer 



problem such that the input, setup, solution* 
and output times take a total of one minute. It 
answers the question, "Haw big a problem 
can one run with this computer?" It is the 
first benchmark to allow fair comparison of 
the whole spectrum of computer speeds and 
types, from the smallest personal computers 
to the fastest vector and parallel supercom- 
puters. SLALOM is rapidly gaining accep- 
tance from computer manufacturers as a just 
and realistic performance metric that has 
been needed for some time. 

WtaH supported by DOR 



esearch on software tools at Argonne 
National Laboratory has been directed at the 
problem of aiding users who are adapting 
existing scientific and engineering computer 
programs to execute efficiently on parallel 
machines. The research is motivated by the 
enormous investments in expertise and comput- 
ing resources that such programs represent. 
Because the work of a community of users may 
depend on the effective application of a 
program, its adaptation to parallel execution 
has significant scientific and economic conse- 
quences. A collection of prototype tools that 
emerged from this research has been further 
developed into a commercial software product 
named VecParJ77 by the Numerical Algorithms 
Group, Inc. (NAG), which will market the 
product worldwide. 

VecParJl is a pre<ompiler tool that 
analyzes and transforms Fortran programs. Its 
function is illustrated by its application to the 
parattelization of a computer program — 



developed by scientists at Argonne, NASA, and 
IBM—that uses multigrid methods to model 
fluid flow in two dimensions. In the existing 
serial Fortran code, four types of operations 
are carried out by sweeping sequentially over 
rectangular subdomains coiled patches. 
Parallelizing the code becomes a matter of 
carrying out these operations concurrently over 
the patches. VecPar77 extracted information 
about dependencies that inhibit parallel 
execution of the key loops that carry out these 
operations. Armed with this information, the 
program developers made changes that enables 
these loops to be executed in parallel. The 
result is a significant performance improve- 
ment on machines that permit such parallel 
operation— e.g. , the Sequent Symmetry. 

The commercial development of VecParJl 
was a cooperative endeavor in which one of the 
Argonne scientists who had participated in the 
research took leave to act as a consultant to 
NAG. His efforts were partially supported by a 
technology commercialization grant from the 
State of Illinois, administered by the Technol- 
ogy Transfer Center at Argonne. 

Work sjppoffad by DOE and tfw state of Hfrws, 
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Urbana t has also developed a computer- 
based procedun that accwmelyco^ 
structure cfribosomal RNA. Ribosomal RNA 

dentists at Argonne National Labora* forms Ae core of the translation apparatus — 

toryandetmni^g^fea^lnUtyofus^ timmedumimduamsdsgewtic^^makm 

logic programming on high-performance into protein sequences. It dm lies at the Aeon 

computers to study two- and threc-dimen- of the celTs function and replication, 
shnal structure in molecular biology. Argonne's automated procedure has 

A group at Argome has devised a com- already proved successful in several ways, 
puter program that exploits logir program- Previously u n d etecte d secondary structure 
ming techniques to match up to 500 sequences futs been ident^edm positions 63 and 104 of 
cfDNA/RNA (basic genetic building blocks cf E,ca&16S ribosomal RNA (shown on the 
organisms). It is the first such automated code diagram in red). Moreover, new tertiary 
to handle more than a modest number tabom structure has been discovered in 7SRNA. 
ten) of sequences. The program, which is Since the chemical activity cfribosomcs is 
based on a combination cf the logic program- involved in a number cf processes— such as 
ming language Prolog and a lower-level building ami repairing tissue 
language C t runs efficiently on the Sequent the immune system's response to diseases- 
Symmetry machine and a network cf Sun understanding the structure cf ribosomal RNA 
workstations. is of considerable scientific and medical 

The Argonnc group, in collaboration with interest, 
scientists at the University of Illinois in tttafcftVpoflrtfayDOE. 
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be used and expanded by others and with 
capabilities that are clearly and accurately 
delimited as an integral part of the code. 

Parallel ftetsanios Sytttms 

HPCC offers ac^endous potential for nonnu- 
mencal applications. We anticipate that future- 
generation supercomputers will actually 
function as scientific assistants, perforating 
routine tasks with great efficiency and speed 
and making informed deer ns on the basis of 
accumulated experience. We need to explore 
the introduction of parallel programming 
paradif ins into automated reasoning and logic 
progrr nming. 

HUMA I RESOURCES 

The long-term viability of U,S. competitiveness 
requires a workforce highly educated in 
engineering and scientific disciplines. In recent 
years traditional theoretical and experimental 
techniques in science and engineering have 
been augmented by a powerful new technique: 
computational science. Scientists and engineers 
now use computers for basic research in 
complex physical, biological, and chemical 
phenomena, as well as in the design and 
manufacturing of complex engineering systems 
including automobiles, airplanes, nuclear 
reactors, and computers. 

While theory and experimental methods will 
not be replaced by computational science, they 
are being augmented by it in important ways. 
Computer simulations bridge theory and ex- 
periment by simultaneously considering sophis 
ticated models and controlled study of complex 
results. Computing also glues scientific disci- 
plines together. Sophisticated tools are required 
to solve the basic equations on a computer, in- 
dependent of tire particular application. For ex- 
ample, the equations used to approximate wing 
flow also describe gas flow in automobile en- 
gines, blood flow in the heart, waves in the 
ocean, hydraulic machinery, fluids in computer 
disk drives, flow of polymers in material sci- 
ence, motion of oil in the earth, spread of pol- 
lutants in the atmosphere, microchip fabrica- 
tion, and the formation of metal alloys, as well 
as many other complex phenomena. A large 
body of expertise is common to these seem- 
ingly different applications. 

Currently, computational science programs 
are emerging in a small number of U.S. univer- 
sities. DOE laboratories have long been leaders 
in developing computational science techniques 
and in training limited numbers of postdoctoral 
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researchers. However, die number of postdoc- 
toral visitors at the national laboratories in 
mathematics, computational science, and any 
form of engineering is very low relative to 
other areas. 

in the future, high school science and mathe- 
matics courses will introduce computational 
science techniques, students will use them dur- 
ing college, ami they will continue to develop 
them throughout their professional lives. DOE 
will play a substantial role in supporting our 
educational system and in helping to provide 
the highly specialized training beyond that 
which is possible within traditional degree pro- 
grams. The following are ways that DOE plans 
to contribute: 

High School Programs to Attract Talented 
Students 

DOE will significantly expand its current 
summer program for inviting high school 
students and high school teachers to visit 
national laboratories. In addition, it will 
establish close ties with high school teachers to 
assist in appropriate curriculum development, 
to provide guest lecturers, and to enhance their 
use of computers in the classroom, including 
access to supercomputers. This type of early 
exposure to computing should draw more 
students into computational science. In addi- 
tion, it will generate a U.S. workforce that 
better understands how computers can improve 
their productivity, regardless of their ultimate 
career choice. 

Undergraduate Programs to Augment 
Educational Training 

DOE laboratories currently have limited 
«rmmer and co-op programs tor college 
students that provide an opportunity for 
students to see how their formal training 
applies in practice. These programs also 
provide valuable feedback and support to 
schools on the status of their programs, DOE 
plans 10 enlarge these programs and extend 
them to top students from a greater number of 
colleges and universities. The emerging 
university graduate programs in computational 
science will need support in the next decade to 
evolve undergraduate programs. This support 
will include access to advanced computers for 
classroom teaching, summer jobs in computa- 
tional science for undergraduates, and develop- 
ment of course materials that cut across 
traditional departments. 



Gradnti Pragmas to Foots Stadtnts en 

RtSWrtOpfWltsnltitt 

At the graduate level, we need to interest the 
best smdents in solving tough, important 
problems. DOE can contribute to this process 
in several ways, including jointly funded 
research projects; participation of DOE 
researchers on Ph.D. thesis committees: 
sponsorship of research and curriculum 
development workshops; creation of pre- and 
postdoctoral fellowships for computational 
science in universities; ami establishment of 
sabbatical opportunities for faculty ami their 
students, Many of these activities take place 
row; additional DOE funding through the 
HPCC Program will permit them to take place 
on a much larger scale. 

Posi-Graduito Programs to Mwm Rtssartb 

New Ph.D.s have the skill am} knowledge to 
tackle the enabling technology needs and grand 
challenges of the HPCC Program. DOE has 
started a postdoctoral program at the labs to 
strengthen research efforts in key areas. For the 



HPCC Program* it is expected that die number 
of DOE-funded postdoctoral positions will 
increase substantially at the national laborato- 
ries and at universities, 

Unlvsrsity/lmiuftiy Exchanges to Eipand 
Ttsfmofogy Truster 

Technology transfer addresses those people 
already in the workforce, DOE must provide a 
wide range of training to expand staffs skill in 
the rapidly changing area of computational 
science, DOE ran also use its existing expert**- 
by encouraging temporary personnel ex- 
changes. DOE staff could help build new 
programs in computational science within high 
schools and universities while exposing faculty 
to the nature and needs of the national laborato- 
ries. Likewise, exchanges with industry could 
quickly accelerate the process of taking 
technology developed at the laboratories and 
put it to use in industry. In the past, political, 
social, ami legislative obstacles have severely 
limited this type of activity; with the HPCC 
Program, DOE hopes to change this situation. 



he Computational Science Graduate 
Fellowship Program, sponsored by the DOE'S 
Office of Energy Research, is designed to 
support highly capable science and engineering 
students interested in pursuing doctoral study in 
an applied science or an engineering discipline 
with applications in high performance comput- 
ing. The program provides a stipend ($18 WO 
the first year, $19 Write second year, $20,400 
the third year, and ffl fiQQ the fourth year), 
tuition and fees, an institutional allowance 
($IfiOO a year), some travel expenses, and 
funding for a workstation. 

The program is open to US. citizens who 
have a bachelor's degree in life or physical 
sciences* engineering, or mathematics. Eligible 
applicants may be entering graduate school or 
have extensive graduate study, Students who 
have received department (faculty) approval of a 
PhD. Omis topic are run eligible for the 
program. 



Universities must apply and be approved for 
participation in the program. Acceptance will 
be based on material submitted in the univer- 
sity application. These materials include a 
description of the curriculum, enrollment data, 
previous and ongoing research, postgraduate 
employment record, and faculty rtsumts. 

Student applications consist of undergradu- 
ate and graduate transcripts (ORE scores will 
be required t^er the 1991-1992 cycle), faculty 
references, an academic and career goal 
statement, and a list of work experiences and 
publications* 

All fellows are required to work at a DOE 
or a DOE-approved facility in a research 
assignment related to ongoing high perfor- 
mance computing activities. 

For application forms and more informa- 
tion, contact 

Computational Science Graduate 

Fellowship Program 

Srienee/Engineering Education Division 
Oak Ridge Associated Universities 
P.O. Box U7 t Oak Ridge, TN 37*30-01 17 
(615) 5764)128; Telefax (615) 576-0202 
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IQNCLUSIOfi 

This document describes the DOE 
program component of the Federal 
High Performance Computing and 
Communications (HPCC) Program. 
The HPCC Program features increased 
cooperation between industry, academia. and 
government and defines a strategy for support- 
ing advances through directed R&D efforts* 
reduced uncertainties to industry for develop- 
ment and use of new technologies* support for 
underlying research, network, and computa- 
tional infrastructures, ami support for the U.S. 
human resource base to meet the needs of 
industry, academia, and government. 

The HPCC Program consists of four 
complementary components in key areas of 
HPCC. These areas are HPCC systems, 
advanced software technology and algorithms. 



the National Research and Education Network, 
and basic research and human resources. The 
HPCC Program is driven by the need for 
unprecedented computational power to investi- 
gate ami understand a wide range of scientific 
and engineering problems that are referred to as 
grand challenges ami that are the fundamental 
problems of investigation for the mission of 
DOE ami the other participating agencies. 

The DOE has a long history of reliance on 
and excellence in high performance computing 
and computational science* Consequently, DOE 
fully embraces the goals of the HPCC Program 
ami intends to mobilize its scientific commu- 
nity to help accomplish them. 

Several of the themes that will pervade 
DOE's HPCC Program are already clear, others 
will emerge as the program is implemented and 
evolves. The program will be applications 
driven but highly interdisciplinary, involving 
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extensive teaming of scientists and engineers 
with mathematicians an' rromputer scientists. 
Collaboration with industry and universities 
will be strongly encouraged. HPC Research 
Centers will provide a focus for HPCC Pro- 
gram intellectual activities. Computational 
sconce education will be emphasized to insure 
future human resources for the program. 

DOE will make use of the field work 
proposal system to fund HPCC activities at the 
national laboratories. Unsolicited proposals 
submitted under the Special Research Grant 
Program (see DOE/ER-0249, October 1985) 
will be used to fund HPCC Program activities 
a' other organizations. 
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